我正在尝试从某些网站上抓取数据。但过了一会儿,网络爬虫开始给出扭曲的互联网错误 ConnectionLost 错误。我不明白扭曲的工作。此外,由于这个错误,网络爬虫会持续运行很长时间。不知道是什么导致他们工作缓慢。请提出一些理由。我的互联网连接很好。
以下是错误:
2014-02-04 14:22:20+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-02/romanians-reject-euro-loans-after-hungary-disaster-mortgages.html> (
failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error
.ConnectionLost'>>]
2014-02-04 14:22:20+0530 [bb] INFO: Crawled 20 pages (at 7 pages/min), scraped 0
items (at 0 items/min)
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-03/u-s-said-to-probe-banks-over-sovereign-wealth-fund-deals.html> (fail
ed 1 times): User timeout caused connection failure: Getting http://www.bloomber
g.com/news/2014-02-03/u-s-said-to-probe-banks-over-sovereign-wealth-fund-deals.h
tml took longer than 180 seconds..
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://search1.bloomberg.com/
search/?content_type=all&page=1&q=ROYAL%20BANK%20OF%20CANADA> (failed 1 times):
User timeout caused connection failure: Getting http://search1.bloomberg.com/sea
rch/?content_type=all&page=1&q=ROYAL%20BANK%20OF%20CANADA took longer than 180 s
econds..
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-03/canada-consumer-sentiment-dips-to-8-month-low-on-currency.html> (fai
led 1 times): User timeout caused connection failure.
谢谢您的帮助