1

Is there a way to get Nutch to increase the crawling of pages that gets updated frequently?

E.g. index pages and feeds.

It would also be of value to refresh fresh pages that contains comments more frequently the first date after the page was created. Any tips are appreciated.

4

1 回答 1

1

您需要的是Adaptive Fetch Schedule。我写了一篇关于它如何工作的博客文章。基本上,这个调度器所做的就是逐渐使变化更频繁的页面被越来越频繁地访问。

于 2010-07-08T12:26:35.407 回答