urlresponse.url.splitt和urlparse的区别

点击联系发帖人 时间：2017-10-29 02:41

python3 urlsplit

分析两个 url 查询字符串和 hash 的区别_百度知道
色情、暴力
我们会通过消息、邮箱等方式尽快将举报结果通知您。
分析两个 url 查询字符串和 hash 的区别
我有更好的答案
itg=0&lm=-1&],&lm=-1&ct=&cl=2&face=0&istype=2&amp?tn=baiduimage&z=&se=1&=&&/i?tn=baiduimage&ipn=r&ct=&&lt?tn=baiduimage&query'ipn=r&=&array_diff($url1_query_istype=2&itg=0&ie=utf-8&parse_url($url2);&parse_str($url1_arr[&#39,&showtab=0&$url2_query_arr);$fragment_diff&=&/i;pv=&&&fragment'],&$url1_fragment_arr);parse_str($url2_arr['nc=1&z=&istype=2&itg=0&ie=utf-8&fr=top&sf=1&amp://charset=utf-8&);&//&分析两个&url&word=%E5%91%A8%E6%9D%B0%E4%BC%A6#z=9&width=0&height=0&z=&se=1&echo&query'array_diff($url1_/i;hash&=&parse_url($url1);;" target="_blank">http://;&$url1_arr&print_r($query_diff);print_r($fragment_diff);;ie=utf-8&word=%E5%91%A8%E6%9D%B0%E4%BC%A6#z=2&se=1&showtab=0&fb=0&$url2_query_arr);&parse_str($url1_arr['查询字符串和&se=1&showtab=0&fb=0&height=0&pn=&$url2_arr&=&ic=0&nc=1&header(&Content-fragment'],&$url2_fragment_arr);&$query_diff&nbsp.width=&height=&face=0&cl=2&lm=-1&st=-1&fm=result&fr=top&amp通过代码查看&lt?&width=&height=&face=0&face=0&istype=2&$url2&=&fm=result&amp
href&quot，将代码放到你的HTML中，然后用浏览器打开:alert(window.超链接&//a&gt，URL后面多了一个“#foo”;#foo&quot。href：设置或获取整个 URL 为字符串;a&&br /&&a href=&href&lt。点击&&gt.location.href)&a href=&&gt.hash)&&hash&&#47hash，你会发现弹出的是#foo。&a href=&&javascript:alert(，你会发现弹出的是地址栏的URL地址。点击&hash&，测试步骤：点击“超链接”，你会发现在地址栏URL发生了变化。通过下面的测试你会发现区别：设置或获取 href 属性中在井号“#”后面的分段
为您推荐：
其他类似问题
换一换
回答问题，赢新手礼包Python 3 : Why would you use urlparse/urlsplit - Stack Overflow
Join Stack Overflow to learn, share knowledge, and build your career.
or sign in with
I'm not exactly sure what these modules are used for.
I get that they split the respective url into its components, but why would that be useful, or what is an example of when to use urlparse?
closed as too broad by , , , ,
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer.
Avoid asking multiple distinct questions at once. See the
page for help clarifying this question. If this question can be reworded to fit the rules in the , please .
Use urlparse only if you need parameter. I have explained below why do you need parameter for.
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the
URL. This should generally be used instead of urlparse() if the more
recent URL syntax allowing parameters to be applied to each segment of
the path portion of the URL (see ) is wanted.
Hostname is always useful to store in variable to use it later or adding parameter, query to hostname to get the web page you want while scraping.
Regarding Parameter:
FYI: According to RFC2396, parameter in url
Extensive testing of current client applications demonstrated that the
majority of deployed systems do not use the ";" character to indicate
trailing parameter information, and that the presence of a semicolon
in a path segment does not affect the relative parsing of that
segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence has
been removed from the algorithm for resolving a relative URI
reference.
Parameter are useful in scraping,
e.g. if the url is /products/women?color=green
When you use urlparse, you will get parameter. Now You have to change it to men so it will be /products/men?color=green and kids, girl, boy so on.
2,51811538
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enabled[转载]Python2.7 urlparse学习 - 月未央 - 博客园
随笔 - 123, 文章 - 0, 评论 - 0, 引用 - 0
urlparse模块主要是把url拆分为6部分，并返回元组。并且可以把拆分后的部分再组成一个url。主要有函数有urljoin、urlsplit、urlunsplit、urlparse等。&
urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
& & 将urlstring解析成6个部分，它从urlstring中取得URL，并返回元组 (scheme, netloc, path, parameters, query, fragment)，但是实际上是基于namedtuple，是tuple的子类。它支持通过名字属性或者索引访问的部分URL，每个组件是一串字符，也有可能是空的。组件不能被解析为更小的部分，%后面的也不会被解析，分割符号并不是解析结果的一部分，除非用斜线转义，注意，返回的这个元组非常有用，例如可以用来确定网络协议(HTTP、FTP等等 )、服务器地址、文件路径，等等。
&&& import urlparse
&&& url=urlparse.urlparse('/index.php?username=guol')
&&& print url
ParseResult(scheme='http', netloc='', path='/index.php', params='', query='username=guol', fragment='')
&&& print url.netloc
urlparse.urlunparse(parts)
& & 从一个元组构建一个url，元组类似urlparse返回的，它接收元组(scheme, netloc, path, parameters, query, fragment)后，会重新组成一个具有正确格式的URL，以便供Python的其他HTML解析模块使用。
&&& import urlparse
&&& url=urlparse.urlparse('/index.php?username=guol')
&&& print url
ParseResult(scheme='http', netloc='', path='/index.php', params='', query='username=guol', fragment='')
&&& u=urlparse.urlunparse(url)
&&& print u
http:///index.php?username=guol
urlparse.urlsplit(urlstring[, scheme[, allow_fragments]])
& & 主要是分析urlstring，返回一个包含5个字符串项目的元组：协议、位置、路径、查询、片段。allow_fragments为False时，该元组的组后一个项目总是空，不管urlstring有没有片段，省略项目的也是空。urlsplit()和urlparse()差不多。不过它不切分URL的参数。适用于遵循RFC2396的URL，每个路径段都支持参数。这样返回的元组就只有5个元素。
&&& import urlparse
&&& url=urlparse.urlparse('/index.php?username=guol')
&&& print url
ParseResult(scheme='http', netloc='', path='/index.php', params='', query='username=guol', fragment='')
&&& url=urlparse.urlsplit('/index.php?username=guol')
&&& print url
SplitResult(scheme='http', netloc='', path='/index.php', query='username=guol', fragment='')
urlparse.urlunsplit(parts)
& &&urlunsplit使用urlsplit()返回的值组合成一个url
urlparse.urljoin(base, url[, allow_fragments])
& &&urljoin主要是拼接URL，它以base作为其基地址，然后与url中的相对地址相结合组成一个绝对URL地址。函数urljoin在通过为URL基地址附加新的文件名的方式来处理同一位置处的若干文件的时候格外有用。需要注意的是，如果基地址并非以字符/结尾的话，那么URL基地址最右边部分就会被这个相对路径所替换。如果希望在该路径中保留末端目录，应确保URL基地址以字符/结尾。
&&& import urlparse
&&& urlparse.urljoin('/tieba','index.php')
'/index.php'
&&& urlparse.urljoin('/tieba/','index.php')
'/tieba/index.php'(python学习)
(python学习)
第三方登录：python基础（4）
urlparse是用来解析url格式的，url格式如下：protocol :// hostname[:port] / path / [;parameters][?query]#fragment，其中;parameters一般用来指定特殊参数，使用的较少，至少我没怎么碰到，举几个链接：，
一：urlparse快速使用
&&&&urlparse(url, scheme='', allow_fragments=True)：将&scheme&://&netloc&/&path&;&params&?&query&#&fragment&解析成一个6元组：(scheme, netloc,
path, params, query, fragment)。返回值是元组，继承自tuple，定义了一些属性，如netloc等。urlunparse是其逆操作。
[python]&&
&&&&urlsplit(url, scheme='', allow_fragments=True)：将&scheme&://&netloc&/&path&?&query&#&fragment&解析成一个5元组：(scheme, netloc, path, query, fragment)。urlunsplit是其逆操作。和urlparse很像，只是少了一个较少适用的参数，urlparse的内部实现就是调用urlsplit，如果url中没有[;parameters]，建议使用urlsplit，更明确，更简洁。
[python]&&
二：源码分析
& &&上述两个函数返回的对象都是元组，且都有自己的方法，主要是因为结果集是继承自tuple,代码如下：
[python]&&
& &&其中SplitResult是urlsplit的返回值，ParseResult是urlparse的返回值，可以看出主要区别还是有无params参数。从这里也可以学习到如何扩展，tuple接受一个序列作为参数，不止是上述的元组对像，且__new__需要返回构建的对象。我们可以实现自己的扩展元组，接受一list对象。
&&&注意一下BaseResult的__slot__用法，__slot__作用是阻止类实例化对象时分配__dict__，而如果有了__dict__，那么随便添加属性就很方便了。BaseResult将__slot__设为空，就是为了随意给返回对象添加属性，而我们刚刚自定义的就不一样。
我们看看BaseResult，
& &&urljoin(base, url, allow_fragments=True)，合成url函数，还记得项目中是自己写的，汗，这边有现成的。
& &urldefrag(url)，将url中的fragment去的，即去掉“#”后面的链接。
& &_splitnetloc(url, start=0)，从url中获取netloc。
& & 值得说明一点的是整个urlparse模块都没有采用正则去匹配数据，完全是序列话的分析，很值得一看。
写的很好的博客，学到了，转过来马克一下。
&&相关文章推荐
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：119482次
积分：5007
积分：5007
排名：第6360名
原创：396篇
转载：11篇
评论：21条
(1)(1)(1)(5)(3)(11)(3)(77)(158)(66)(36)(30)(15)
(window.slotbydup = window.slotbydup || []).push({
id: '4740887',
container: s,
size: '250,250',
display: 'inlay-fix'}

天天发财游戏网

urlresponse.url.splitt和urlparse的区别

我要回帖

更多关于 python3 urlsplit 的文章

更多推荐