python - Scrapy：连接被拒绝

Question

尝试测试 scrapy 安装时收到错误消息：

$ scrapy shell http://www.google.es
j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot)
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines: 
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-02-16 10:54:46+0100 [default] INFO: Spider opened
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused.
    ]
2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error
    Traceback (most recent call last):
    Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused.

2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown)
2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)

版本：

Scrapy 0.12.0.2536
Python 2.6.6
操作系统：Ubuntu 10.10

编辑：我可以使用我的浏览器、wget、telnet google.es 80 访问它，并且所有站点都会发生这种情况。

score 10 · Accepted Answer

任务 1：Scrapy 将发送一个带有“bot”的用户代理。站点也可能基于用户代理进行阻止。

尝试在 settings.py 中覆盖 USER_AGENT

例如：USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'

任务 2：尝试在请求之间设置延迟，以欺骗人类正在发送请求。

DOWNLOAD_DELAY = 0.25

任务 3：如果没有任何效果，请安装wireshark 并查看scrapy 发送和浏览器发送时请求标头（或）发布数据的差异。

score 1 · Accepted Answer

您的网络连接可能存在问题。

首先，检查您的互联网连接。

如果你通过代理服务器访问网络，你应该在你的scrapy项目中添加一段代码（http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware )

无论如何，尝试升级你的scrapy版本。

score 0 · Accepted Answer

我也遇到了这个错误。原来是我访问的端口被防火墙阻止了。我的服务器默认阻止端口，除非它被列入白名单。

python - Scrapy：连接被拒绝

3 回答 3

Related

Reference