现在我使用 3.6.1 和 nutch 1.5,它运行良好......我抓取我的网站并将数据索引到 solr 并使用 solr 搜索,但两周前它开始不起作用......当我使用 ./nutch 抓取网址时-solr http://localhost:8080/solr/
-depth 5 -topN 100 命令可以工作,但是当我使用 ./nutch crawl urls -solr http://localhost:8080/solr/
-depth 5 -topN 100000 时,它会抛出异常,在我的日志文件中我发现了这个..
2013-02-05 17:04:20,697 INFO solr.SolrWriter - Indexing 250 documents
2013-02-05 17:04:20,697 INFO solr.SolrWriter - Deleting 0 documents
2013-02-05 17:04:21,275 WARN mapred.LocalJobRunner - job_local_0029
org.apache.solr.common.SolrException: Internal Server Error
Internal Server Error
request: `http://localhost:8080/solr/update?wt=javabin&version=2`
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-05 17:04:21,883 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-05 17:04:21,887 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-05 17:04:21
2013-02-05 17:04:21,887 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
两周前它运作良好......有人遇到过类似的问题吗?
嗨,我刚刚完成爬网并遇到了同样的异常,但是当我查看我的 log/hadoop.log 文件时,我发现了这个..
2013-02-06 22:02:14,111 INFO solr.SolrWriter - Indexing 250 documents
2013-02-06 22:02:14,111 INFO solr.SolrWriter - Deleting 0 documents
2013-02-06 22:02:14,902 WARN mapred.LocalJobRunner - job_local_0019
org.apache.solr.common.SolrException: Bad Request
Bad Request
request: `http://localhost:8080/solr/update?wt=javabin&version=2`
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:304)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-06 22:02:15,027 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-06 22:02:15,032 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-06 22:02:15
2013-02-06 22:02:15,032 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
2013-02-06 22:02:21,281 WARN mapred.FileOutputCommitter - Output path is null in cleanup
2013-02-06 22:02:22,263 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2013-02-06 22:02:22, elapsed: 00:00:07
2013-02-06 22:02:22,263 INFO crawl.Crawl - crawl finished: crawl-20130206205733
我希望它有助于理解这个问题......