我是 Python 新手,现在我正在学习用它编写一些网络抓取脚本。但是一些奇怪的事情不断发生,我不知道为什么。经过一番测试后,我相信问题出在urllib2.urlopen()
功能上。听我说:
当我使用 bash 打开 Python 解释器python
并输入:
import urllib2
urllib2.urlopen("http://www.baidu.com/") # which is a Chinese version of Google that most of us use only to test if the network connection is fine
事情变得非常快:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 2] No such file or directory>
我不知道它的确切含义,但我确实在网络上进行了一些研究。尽管大多数结果对我的情况没有帮助,但我确实看到有人声称使用sudo
一切正常。
所以我试了一下,从 bash 中打开 python,sudo python
然后运行与上面完全相同的代码。这一次,它似乎永远被卡住了。最后我不得不使用KeyboardInterrupt
,无论我等待程序卡住多长时间,我都会得到相同的回溯结果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1181, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 772, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 562, in create_connection
sock.connect(sa)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
KeyboardInterrupt
我正在通过便携式闪存驱动器在我的 Ubuntu 13.04 桌面上运行 python,目前它在我公司的代理后面运行。不知道是不是代理的问题,我试过设置环境代理
$ export http_proxy="http://domain\username:password@proxyserver:port"
这样至少wget
可以正常工作。
作为比较,当我ssh
回到家中的台式计算机并运行相同的代码时,无论有没有,一切似乎都很好sudo
:
Python 2.7.3 (default, Apr 10 2013, 05:09:49)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> urllib2.urlopen("http://www.baidu.com/")
<addinfourl at 3071291372L whose fp = <socket._fileobject object at 0xb71d6eec>>
>>>
我曾尝试从笔记本电脑上的同一个闪存驱动器运行 Ubuntu,它也不是很好,但我不记得细节了。今天下班后我会把它带回家测试一下,以获取更多信息并将它们发布回这里。在那之前,有人帮忙吗?