python - Python 链接拉取器

Question

所以我成功地使用了这个python脚本：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('https://conceled:conceled@traveler.pha.phila.gov:8443/servlet/traveler')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        print link['href']

将链接拉出网站。它几乎适用于任何其他网站，但是在尝试上述方法时（我需要工作的那个，我得到了很多错误:)

Traceback (most recent call last):
  File "C:\Users\joe\Desktop\PHA\AndroidPhones\androidphonescript2.py", line 5, in <module>
    status, response = http.request('https://conceled@traveler.pha.phila.gov:8443/servlet/traveler')
  File "C:\Python27\lib\httplib2.py", line 608, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cacheFullPath)
  File "C:\Python27\lib\httplib2.py", line 449, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "C:\Python27\lib\httplib2.py", line 427, in _conn_request
    conn.connect()
  File "C:\Python27\lib\httplib.py", line 1157, in connect
    self.timeout, self.source_address)
  File "C:\Python27\lib\socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11003] getaddrinfo failed

score 1 · Accepted Answer

该站点的证书无效，但这似乎不会导致问题。你用的是什么版本的httplib2？我刚刚安装了当前版本，0.7.7，我得到了更好的异常文本：

_conn_request 中的文件“d:\Python27\lib\site-packages\httplib2-0.7.7-py2.7.egg\httplib2__init__.py”，第 1287 行引发 ServerNotFoundError(“无法在 %s 找到服务器”% conn .host) ServerNotFoundError: 无法在 conceled:conceled@traveler.pha.phila.gov 找到服务器

所以它不会被解析//username:password@为用户名和密码。Httplib2 文档表明凭据应通过以下方式提供：

Http.add_credentials(name, password[, domain=None])

所以试试：

http = httplib2.Http()
http.add_credentials(name, password)
status, response = http.request('https://traveler.pha.phila.gov:8443/servlet/traveler')

我在网站上没有帐户，所以我无法测试。

如果您需要能够支持 URL 中的用户名和密码，则必须自己编写代码来解析它。使用正则表达式（Python re 模块）应该不会太难。

python - Python 链接拉取器

1 回答 1

Related

Reference