我有一个功能可以打开一个页面urllib2
来提取一些数据。它在 80% 的情况下运行良好,但在 20% 的情况下,我会收到IncompleteRead
异常。
追溯
Traceback (most recent call last):
File "test.py", line 380, in <module>
main()
File "test.py", line 109, in main
soups.append(BeautifulSoup(out_queue.get().read()))
File "c:\python27\lib\socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "c:\python27\lib\httplib.py", line 541, in read
return self._read_chunked(amt)
File "c:\python27\lib\httplib.py", line 601, in _read_chunked
value.append(self._safe_read(chunk_left))
File "c:\python27\lib\httplib.py", line 649, in _safe_read
raise IncompleteRead(''.join(s), amt)
httplib.IncompleteRead: IncompleteRead(958 bytes read, 678 more expected)
我通过基本打开页面,
response = urllib2.urlopen('the_url')
然后在程序中将其转换为BeautifulSoup
对象。
当初始请求发出时,有没有办法解决问题urllib2
?在我尝试用它做某事之前,有什么方法可以验证数据是“完整的”吗?