0

这个简单的Python 3脚本:

import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)

引发此异常:

Traceback (most recent call last):
  File "C:/Users/ricardo/Desktop/Google-Scholar/BibTex/test2.py", line 8, in <module>
    urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1569, in retrieve
    fp = self.open(url, data)
  File "C:\Python32\lib\urllib\request.py", line 1541, in open
    raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
  File "C:\Python32\lib\urllib\request.py", line 1537, in open
    return getattr(self, name)(url)
  File "C:\Python32\lib\urllib\request.py", line 1715, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python32\lib\urllib\request.py", line 1695, in _open_generic_http
    http_conn.request("GET", selector, headers=headers)
  File "C:\Python32\lib\http\client.py", line 967, in request
    self._send_request(method, url, body, headers)
  File "C:\Python32\lib\http\client.py", line 1005, in _send_request
    self.endheaders(body)
  File "C:\Python32\lib\http\client.py", line 963, in endheaders
    self._send_output(message_body)
  File "C:\Python32\lib\http\client.py", line 808, in _send_output
    self.send(msg)
  File "C:\Python32\lib\http\client.py", line 746, in send
    self.connect()
  File "C:\Python32\lib\http\client.py", line 724, in connect
    self.timeout, self.source_address)
  File "C:\Python32\lib\socket.py", line 386, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed

print我可以很好地打开语句产生的 url :

http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0

这是什么原因造成的?我尝试更改http://http:///(三个斜杠),但引发了相同的异常。

4

1 回答 1

2

这是你的问题:

urllib.request.urlretrieve("http://scholar.google.com" + url, filename)

您要添加该http://scholar.google.com部分两次(url 已经开始http://scholar.google.com)。因此urillib认为您正在请求一个页面scholar.google.comhttp——不用说,这个域不存在。这正是你的错误所说的。

只是要求url很明显。

方便的提示以便将来更快地找到这种东西:添加print调试语句时,请务必打印您正在调试的命令中使用的实际值。print如果您的语句连接了基本 URL,您将在大约两秒钟内找到它。

于 2012-07-17T22:03:20.273 回答