nutch - Nutch readlinkdb 不输出任何东西

翻译自：https://stackoverflow.com/questions/12781027 2012-10-08T11:42:06.710

578 次

3

我使用 Nutch 1.5 进行爬网（使用了 crawl 命令），发布这个 readlinkdb 转储什么都不包含。此外，在索引过滤器中，链接为空。是什么导致链接为空？

1 回答 1

3

也许您只是在索引一个特定的站点。在这种情况下，如果db.ignore.internal.linksinnutch-default.xml为真，nutch 将不会存储内部链接。将其设置为 false in nutch-site.xml，您的链接数据库将开始增长。

<property>
  <name>db.ignore.internal.links</name>
  <value>false</value>
  <description>If true, when adding new links to a page, links from
  the same host are ignored.  This is an effective way to limit the
  size of the link database, keeping only the highest quality
  links.
  </description>
</property>

于 2013-04-06T17:08:04.773 回答