python - 如何在 Python 中处理它们之间包含空格的链接

Question

我正在尝试从网页中提取链接，然后在我的网络浏览器中打开它们。我的 Python 程序能够成功提取链接，但有些链接之间有空格，无法使用request module.

例如example.com/A, B C，它不会使用请求模块打开。但是，如果我将其转换为example.com/A,%20B%20C它将打开。python中有没有一种简单的方法来填充空格%20？

`http://example.com/A, B C` ---> `http://example.com/A,%20B%20C`

我想将它们之间有空格的所有链接转换为上述格式。

score 5 · Accepted Answer

urlencode实际上需要一个字典，例如：

>>> urllib.urlencode({'test':'param'})
'test=param'`

你实际上需要这样的东西：

import urllib
import urlparse

def url_fix(s, charset='utf-8'):
    if isinstance(s, unicode):
        s = s.encode(charset, 'ignore')
    scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
    path = urllib.quote(path, '/%')
    qs = urllib.quote_plus(qs, ':&=')
    return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))

然后：

>>>url_fix('http://example.com/A, B C')    
'http://example.com/A%2C%20B%20C'

取自How can I normalize a URL in python

score 1 · Accepted Answer

1

使用网址编码：

import urllib
urllib.urlencode(yourstring)

于 2015-10-10T02:50:58.630 回答

score 0 · Accepted Answer

@rofls答案的 Python 3 工作解决方案。

import urllib.parse as urlparse
def url_fix(s):
    scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
    path = urlparse.quote(path, '/%')
    qs = urlparse.quote_plus(qs, ':&=')
    return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))

python - 如何在 Python 中处理它们之间包含空格的链接

3 回答 3

Related

Reference