python - 使用 urlencode python 构建查询字符串

Question

我正在尝试构建一个 url，以便我可以使用urllib模块向它发送 get 请求。

假设我的final_url应该是

url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value"

现在为了实现这一点，我尝试了以下方法：

>>> initial_url = "http://www.stackoverflow.com"
>>> search = "Generate+value"
>>> params = {"data":initial_url,"search":search}
>>> query_string = urllib.urlencode(params)
>>> query_string
'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com'

现在，如果您将我query_string的格式与您的格式进行比较，final_url您可以观察到两件事

1）参数的顺序是相反的，而不是data=()&search=它是search=()&data=

2）urlencode还编码了+inGenerate+value

我相信第一个变化是由于字典的随机行为。所以，我想用OrderedDict反转字典。正如，我正在使用python 2.6.5我做的

pip install ordereddict

但是当我尝试时，我无法在我的代码中使用它

>>> od = OrderedDict((('a', 'first'), ('b', 'second')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'OrderedDict' is not defined

所以，我的问题是OrderedDict在 python 2.6.5 中使用的正确方法是什么以及如何urlencode忽略+in Generate+value。

另外，这是构建URL.

score 27 · Accepted Answer

您不必担心编码，+它应该在取消转义 url 后在服务器上恢复。命名参数的顺序也不重要。

考虑到 OrderedDict，它不是 Python 内置的。您应该从以下位置导入它collections：

from urllib import urlencode, quote
# from urllib.parse import urlencode # python3
from collections import OrderedDict

initial_url = "http://www.stackoverflow.com"
search = "Generate+value"
query_string = urlencode(OrderedDict(data=initial_url,search=search))
url = 'www.example.com/find.php?' + query_string

如果您的 python 太旧并且模块中没有 OrderedDict collections，请使用：

encoded = "&".join( "%s=%s" % (key, quote(parameters[key], safe="+")) 
    for key in ordered(parameters.keys()))

无论如何，参数的顺序应该无关紧要。

注意的safe参数quote。它防止+被转义，但这意味着，服务器将解释Generate+value为Generate value. +您可以通过写入%2B并标记%为安全字符来手动转义：

score 4 · Accepted Answer

首先，http请求中的参数顺序应该是完全不相关的。如果不是，那么另一边的解析库做错了。

其次，当然+是编码的。+用作编码网址中空格的占位符，因此如果您的原始字符串包含 a +，则必须对其进行转义。urlencode需要一个未编码的字符串，您不能将已编码的字符串传递给它。

score 1 · Accepted Answer

对问题和其他答案的一些评论：

如果您想保留的顺序，请urllib.urlencode提交 k/v 对的有序序列，而不是映射（dict）。当你传入一个字典时， urlencode只需调用foo.items()来获取一个可迭代的序列。

# urllib.urlencode accepts a mapping or sequence # the output of this can vary, because `items()` is called on the dict urllib.urlencode({"data": initial_url,"search": search}) # the output of this will not vary urllib.urlencode((("data", initial_url), ("search", search)))

您还可以传入第二个doseq参数来调整可迭代值的处理方式。

参数的顺序不是无关紧要的。以这两个网址为例：

https://example.com?foo=bar&bar=foo https://example.com?bar=foo&foo=bar

http 服务器应该认为这些参数的顺序无关紧要，但设计用于比较 URL 的函数则不会。为了安全地比较 url，需要对这些参数进行排序。

但是，请考虑重复键：

https://example.com?foo=3&foo=2&foo=1

URI 规范支持重复键，但不解决优先级或排序问题。

在给定的应用程序中，这些都可能触发不同的结果并且也是有效的：

https://example.com?foo=1&foo=2&foo=3
https://example.com?foo=1&foo=3&foo=2
https://example.com?foo=2&foo=3&foo=1
https://example.com?foo=2&foo=1&foo=3
https://example.com?foo=3&foo=1&foo=2
https://example.com?foo=3&foo=2&foo=1

是+一个保留字符，以 urlencoded 形式表示一个空格（%20相对于部分路径）。urllib.urlencode转义使用urllib.quote_plus()，而不是urllib.quote()。OP 很可能只想这样做：

initial_url = "http://www.stackoverflow.com" search = "Generate value" urllib.urlencode((("data", initial_url), ("search", search)))

产生：

data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value

作为输出。

python - 使用 urlencode python 构建查询字符串

3 回答 3

Related

Reference