我觉得我在这里遗漏了一些关于 python 进程限制的非常基本的东西。我有一个屏幕抓取工具,它应该每周访问一次受密码保护的站点,填写表格以更新现有记录,然后抓取新记录。(如果这很重要,我正在使用 Django 来实际插入记录)。
我正在抓取的数据是在一年中积累起来的。所以在一月份,这个过程比较快。到 8 月,除了添加了任何新记录之外,还有数千行需要更新。
今年它就像一个梦想一样工作,但最近开始遇到此回溯的连接错误:
Traceback (most recent call last):
File "douglasdivorces.py", line 42, in <module>
forms = [f for f in br.forms()]
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4.py2.6.egg/mechanize/_mechanize.py", line 420, in forms
return self._factory.forms()
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4-py2.6.egg/mechanize/_html.py", line 557, in forms
self._forms_factory.forms())
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4-py2.6.egg/mechanize/_html.py", line 237, in forms
_urlunparse=_rfc3986.urlunsplit,
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4-py2.6.egg/mechanize/_form.py", line 844, in ParseResponseEx
_urlunparse=_urlunparse,
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4-py2.6.egg/mechanize/_form.py", line 979, in _ParseFileEx
data = file.read(CHUNK)
File "/usr/local/lib/python2.6/dist-packages/mechanize-0.2.4-py2.6.egg/mechanize/_response.py", line 195, in read
data = self.wrapped.read(to_read)
File "/usr/lib/python2.6/socket.py", line 353, in read
data = self._sock.recv(left)
File "/usr/lib/python2.6/httplib.py", line 518, in read
return self._read_chunked(amt)
File "/usr/lib/python2.6/httplib.py", line 551, in _read_chunked
line = self.fp.readline()
File "/usr/lib/python2.6/socket.py", line 397, in readline
data = recv(1)
File "/usr/lib/python2.6/ssl.py", line 96, in <lambda>
self.recv = lambda buflen=1024, flags=0: SSLSocket.recv(self, buflen, flags)
File "/usr/lib/python2.6/ssl.py", line 217, in recv
return self.read(buflen)
File "/usr/lib/python2.6/ssl.py", line 136, in read
return self._sslobj.read(len)
socket.error: [Errno 104] Connection reset by peer
你有没有办法解决这个错误,让我的循环保持在适当的位置,直到问题得到解决?还是我应该采取另一种方法?
同样,我希望我缺少学前水平的东西,所以我会为你省去发布我的代码的所有痛苦。如果不是那么简单,请说出这个词,我将编辑问题以包含脚本。
非常感谢!很想知道是什么让我感到不适!