单独的路径 ( //path
) 无效,这会混淆函数并被解释为主机名
https://www.rfc-editor.org/rfc/rfc3986.html#section-3.3
如果 URI 不包含权限组件,则路径不能以两个斜杠字符(“//”)开头。
我并不特别喜欢这两种解决方案,但它们都有效:
import re
import urlparse
testurl = 'http://www.example.com//path?foo=bar'
parsed = list(urlparse.urlparse(testurl))
parsed[2] = re.sub("/{2,}", "/", parsed[2]) # replace two or more / with one
cleaned = urlparse.urlunparse(parsed)
print cleaned
# http://www.example.com/path?foo=bar
print urlparse.urljoin(
testurl,
urlparse.urlparse(cleaned).path)
# http://www.example.com//path
根据您在做什么,您可以手动加入:
import re
import urlparse
testurl = 'http://www.example.com//path?foo=bar'
parsed = list(urlparse.urlparse(testurl))
newurl = ["" for i in range(6)] # could urlparse another address instead
# Copy first 3 values from
# ['http', 'www.example.com', '//path', '', 'foo=bar', '']
for i in range(3):
newurl[i] = parsed[i]
# Rest are blank
for i in range(4, 6):
newurl[i] = ''
print urlparse.urlunparse(newurl)
# http://www.example.com//path