python - Python urllib2 响应头

Question

我正在尝试提取 URL 请求的响应标头。当我使用 firebug 分析 URL 请求的响应输出时，它返回：

Content-Type text/html

但是，当我使用 python 代码时：

urllib2.urlopen(URL).info()

结果输出返回：

Content-Type: video/x-flv

我是 python 新手，一般是 web 编程；非常感谢任何有用的见解。另外，如果需要更多信息，请告诉我。

提前感谢您阅读这篇文章

score 37 · Accepted Answer

Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:

http://docs.python.org/library/urllib2.html

score 5 · Accepted Answer

内容类型 text/html

真的，像那样，没有冒号吗？

如果是这样，那可能会解释它：它是一个无效的标头，因此它被忽略，因此 urllib 通过查看文件名来猜测内容类型。如果 URL 的末尾碰巧有 '.flv'，它会猜测类型应该是video/x-flv.

score 2 · Accepted Answer

这种特殊的差异可能是由两个请求发送的不同标头（可能是接受类型的标头）来解释的——你能检查一下……吗？或者，如果 Javascript 在 Firefox 中运行（我假设您在运行 firebug 时正在使用它？）——因为它绝对不是在 Python 案例中运行——“所有的赌注都没有”，正如他们所说的那样；-) .

score 1 · Accepted Answer

请记住，Web 服务器可以根据请求的差异为同一 URL 返回不同的结果。例如，内容类型协商：请求者可以指定它将接受的内容类型列表，服务器可以返回不同的结果以尝试适应不同的需求。

此外，您可能会收到一个请求的错误页面，例如，因为它格式错误，或者您没有设置正确验证您的 cookie，等等。查看响应本身以了解您得到的内容。

score 0 · Accepted Answer

根据http://docs.python.org/library/urllib2.html只有get_header()方法，没有关于getheader.

询问是因为您的代码适用于

response.info().getheader('Set cookie')

但是一旦我执行

response.info().get_header('Set cookie')

我得到：

Traceback (most recent call last):
  File "baza.py", line 11, in <module>
    cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'

编辑：此外
response.headers.get('Set-Cookie')，也可以正常工作，在 urlib2 文档中未提及....

python - Python urllib2 响应头

5 回答 5

Related

Reference