python - Python - 从 URL 获取标头信息

Question

我一直在寻找 Python 3.x 代码示例来获取 HTTP 标头信息。

像 PHP 中的 get_headers 这样简单的东西在 Python 中是不容易找到的。或者，也许我不确定如何最好地围绕它。

本质上，我想编写一些可以查看 URL 是否存在的代码

符合的东西

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

到目前为止，我尝试过

h = http.client.HTTPResponse('http://docs.python.org/')

但是总是报错

score 11 · Accepted Answer

要在python-3.x中获取 HTTP 响应代码，请使用该urllib.request模块：

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

返回的HTTPResponse对象也将允许您访问所有标题。例如：

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

如果调用urllib.request.urlopen()失败，则引发 an。您可以处理此问题以获取响应代码：HTTPError Exception

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))

score 2 · Accepted Answer

对于 Python 2.x

这里可以使用 urllib、urllib2 或 httplib。但是请注意，urllib 和 urllib2 使用 httplib。因此，根据您是否计划多次（1000 次）执行此检查，最好使用 httplib。其他文档和示例在此处。

示例代码：

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

对于 Python 3.x

与 Python 2.x 中的 urllib（或 urllib2）和 httplib 类似的故事适用于 Python 3.x 中的 urllib2 和 http.client 库。同样，http.client 应该更快。有关更多文档和示例，请查看此处。

示例代码：

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

如果你想检查你需要更换的状态码

conn.connect()

和

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

请注意，在这两个示例中，如果您想捕获与 URL 不存在时相关的特定异常，而不是所有异常，请捕获 socket.gaierror 异常（请参阅套接字文档）。

score 2 · Accepted Answer

您可以使用 requests 模块来检查它：

import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
    print("bingo")

您还可以在使用header下载网页的全部内容之前检查 header 内容。

score 1 · Accepted Answer

1

你可以使用 urllib2 库

import urllib2
if urllib2.urlopen(url).code == 200:
    print "Bingo"

于 2013-02-19T04:08:41.383 回答

python - Python - 从 URL 获取标头信息

4 回答 4

对于 Python 2.x

对于 Python 3.x

Related

Reference