6

I just got scrapy setup and running and it works great, but I have two (noob) questions. I should say first that I am totally new to scrapy and spidering sites.

  1. Can you limit the number of links crawled? I have a site that doesn't use pagination and just lists a lot of links (which I crawl) on their home page. I feel bad crawling all of those links when I really just need to crawl the first 10 or so.

  2. How do you run multiple spiders at once? Right now I am using the command scrapy crawl example.com, but I also have spiders for example2.com and example3.com. I would like to run all of my spiders using one command. Is this possible?

4

2 回答 2

2

对于#1:不要使用规则属性来提取链接并遵循,在解析函数中编写规则并产生或返回请求对象。

对于#2:尝试scrapyd

于 2010-11-25T05:41:24.673 回答
1

归功于 Shane,这里https://groups.google.com/forum/?fromgroups#!topic/scrapy-users/EyG_jcyLYmU

使用 CloseSpider 应该允许您指定此类限制。

http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.contrib.closespider

因为不需要,所以还没试过。看起来您可能还必须在设置文件中启用作为扩展名(请参见同一页顶部)。

于 2012-07-12T19:44:11.117 回答