python - 如何使用 python 脚本中的不同管道调用来自不同项目的蜘蛛？

Question

我在名为 REsale、REbuy 和 RErent 的不同爬虫项目中有三个不同的蜘蛛，每个都有自己的管道，将它们的输出定向到我服务器上的各种 MySQL 表。当使用scrapy crawl. 最终，我想要一个可以在我的 Windows 7 机器上作为服务运行的脚本，它以不同的时间间隔运行蜘蛛。ATM，我被困在scrapy API上。我什至无法让它运行其中一只蜘蛛！有什么特别的地方需要保存吗？目前它只是在我的根 python 目录中。Sale、Buy 和 Rent 是我使用的蜘蛛的名称，scrapy crawl而 sale_spider 等是蜘蛛的 .py 文件。

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from REsale.spiders.sale_spider import Sale
from REbuy.spiders.buy_spider import Buy
from RErent.spiders.sale_spider import Rent

spider = Buy()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

spider = Rent()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

spider = Sale()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

这是返回错误：

c:\Python27>File "real_project.py", line 5, in <module>
from REsale.spiders.sale_spider import Sale
ImportError: No module named REsale.spiders.sale_spider

我是新手，非常感谢任何帮助。

score 0 · Accepted Answer

我建议你看看http://scrapyd.readthedocs.org/en/latest/，一个现成的用于部署和调度scrapy spiders的scrapy守护进程

python - 如何使用 python 脚本中的不同管道调用来自不同项目的蜘蛛？

1 回答 1

Related

Reference