我在使用 Scrapy + Mongodb 和 Tor 时遇到问题。当我尝试在 Scrapy 中使用 mongodb 管道时出现以下错误。
2012-11-05 13:41:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
|S-chain|-<>-127.0.0.1:9050-<><>-127.0.0.1:27017-<--denied
Traceback (most recent call last):
File "/usr/bin/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 131, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 97, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 138, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 42, in run
q = self.crawler.queue
File "/usr/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/usr/lib/python2.7/dist-packages/scrapy/crawler.py", line 43, in configure
self.engine = ExecutionEngine(self.settings, self._spider_closed)
File "/usr/lib/python2.7/dist-packages/scrapy/core/engine.py", line 33, in __init__
self.scraper = Scraper(self, self.settings)
File "/usr/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_settings(settings)
File "/usr/lib/python2.7/dist-packages/scrapy/middleware.py", line 33, in from_settings
mw = mwcls()
File "/home/bharani/ABCD_scraper/political_forum_scraper/pipelines.py", line 9, in __init__
settings['MONGODB_PORT'])
File "/usr/local/lib/python2.7/dist-packages/pymongo/connection.py", line 290, in __init__
self.__find_node()
File "/usr/local/lib/python2.7/dist-packages/pymongo/connection.py", line 586, in __find_node
raise AutoReconnect(', '.join(errors))
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused
我不知道如何解决这个问题。当我不使用proxychains
时,它爬行得非常好。
任何帮助表示赞赏。
谢谢。
编辑:
它不是特定于代码的。请参阅此链接:http: //isbullsh.it/2012/04/Web-crawling-with-scrapy/
这是一个使用的简单Scrapy
教程MongoDB
。我们应该打电话
scrapy crawl isbullshit
运行运行良好的爬虫。要使用Tor
,应该像这样调用它:
proxychains scrapy crawl isbullshit
这对我不起作用。教程的源代码在这里:https ://github.com/BaltoRouberol/isbullshit-crawler