1

我开发了一部分代码,我从网络抓取中使用:

link = 'http://www.cmegroup.com'+div.findAll('a')[3]['href']
user_agent = 'Mozilla/5.0'
headers = {'User-Agent':user_agent}
req = urllib2.Request(link, headers=headers)
page = urllib2.urlopen(req).read()

但是我不明白的是,有时我在请求链接时收到错误消息。但有时,我不会。例如,错误:

urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

出来这个链接:

http://www.cmegroup.com/trading/energy/refined-products/mini-european-naphtha-platts-cif-nwe-swap-futures_product_calendar_futures.html

当我重新运行代码时,我不会再收到此链接的错误,而是其他一些错误。这可能是由于无线连接造成的吗?

4

1 回答 1

2

This looks like a DNS or network problem. If you run the same code for the same URL several times and it sometimes works but sometimes doesn't, the problem is probably not your code.

To debug the issue, you could do a try-except block around the statement and start pdb or ipdb (if installed) from there:

try:
    response = urllib2.urlopen(req)
except urllib2.URLError as ex:
    import pdb; pdb.set_trace()  # Use ipdb if installed
else:
    page = response.read()

Then you can take a look at the response, the status code, the exception trace etc...

(As a sidenote, if external dependencies are not a problem, I'd strongly recommend to use the requests package instead of urllib2.)

于 2013-08-16T12:24:22.907 回答