6

我已经按照 nutch 教程http://wiki.apache.org/nutch/NutchTutorial运行了 nutch 爬虫,但是当我开始将它加载到 solr 时,我收到了这条消息,即“没有激活 IndexWriters - 检查你的配置

bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -dir crawl/segments/
Indexer: starting at 2013-07-15 08:09:13
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
**No IndexWriters activated - check your configuration**

Indexer: finished at 2013-07-15 08:09:21, elapsed: 00:00:07
4

4 回答 4

7

确保indexer-solr包含该插件。转到文件:conf/nutch-site.xml并在属性中plugin.includes添加插件,例如:

协议-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)

添加插件后,No IndexWriters activated - check your configuration警告在我的情况下消失了。

检查这个线程: http: //lucene.472066.n3.nabble.com/a-plugin-extending-IndexWriter-td4074353.html

于 2013-07-31T21:42:38.753 回答
2

@Tryskele + @Scott101 为我工作:

将 plugin.includes 属性添加到 /conf/nutch-site.xml 和 runtime/local/conf/nutch-site.xml 文件中:

<property>
  <name>plugin.includes</name>
  <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
</property>
于 2015-03-24T16:29:01.563 回答
0

在 conf/nutch-site.xml 中为插件添加以下属性

<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
</property>

让我知道它是否能解决您的问题。

于 2014-09-20T06:13:08.787 回答
0

不知道这是否仍然是一个问题,但我遇到了这个问题,然后意识到我src/plugin/build.xml缺少indexer-solr插件。添加以下内容然后重新编译 nutch 为我修复了它:

<ant dir="indexer-solr" target="deploy"/>

于 2014-08-22T14:07:32.873 回答