1

我正在尝试通过以下命令索引我的 Nuch 爬网数据:

bin/nutch index -D solr.server.url="https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/sc97b4177a_600f_4040_9309_e632c116443f/solr/localWebCollection/" -D solr.auth=true -D solr.auth.username="USER" -D solr.auth.password="PASS" final/crawl/crawldb -linkdb final/crawl

我没有收到任何错误,但是当我运行它时,几秒钟后它结束并且没有索引。这是我的日志:

2016-07-22 20:03:09,599 INFO  indexer.IndexingJob - Indexer: starting at            2016-07-22 20:03:09
2016-07-22 20:03:09,707 INFO  indexer.IndexingJob - Indexer: deleting gone documents: false
2016-07-22 20:03:09,708 INFO  indexer.IndexingJob - Indexer: URL filtering: false
2016-07-22 20:03:09,708 INFO  indexer.IndexingJob - Indexer: URL normalizing: false
2016-07-22 20:03:10,216 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2016-07-22 20:03:10,216 INFO  indexer.IndexingJob - Active IndexWriters :
SolrIndexWriter
    solr.server.type : Type of SolrServer to communicate with (default 'http' however options include 'cloud', 'lb' and 'concurrent')
    solr.server.url : URL of the Solr instance (mandatory)
    solr.zookeeper.url : URL of the Zookeeper URL (mandatory if 'cloud' value for solr.server.type)
    solr.loadbalance.urls : Comma-separated string of Solr server strings to be used (madatory if 'lb' value for solr.server.type)
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
    solr.commit.size : buffer size when sending to Solr (default 1000)
    solr.auth : use authentication (default false)
    solr.auth.username : username for authentication
    solr.auth.password : password for authentication

2016-07-22 20:03:10,220 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: final/crawl/crawldb
2016-07-22 20:03:10,220 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: final/crawl
2016-07-22 20:03:10,376 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-22 20:03:10,495 WARN  indexer.IndexerMapReduce - Ignoring linkDb for indexing, no linkDb found in path: final/crawl
2016-07-22 20:03:11,381 WARN  conf.Configuration - file:/tmp/hadoop-sdavari/mapred/staging/sdavari1351924025/.staging/job_local1351924025_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-07-22 20:03:11,385 WARN  conf.Configuration - file:/tmp/hadoop-sdavari/mapred/staging/sdavari1351924025/.staging/job_local1351924025_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-07-22 20:03:11,551 WARN  conf.Configuration - file:/tmp/hadoop-sdavari/mapred/local/localRunner/sdavari/job_local1351924025_0001/job_local1351924025_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-07-22 20:03:11,557 WARN  conf.Configuration - file:/tmp/hadoop-sdavari/mapred/local/localRunner/sdavari/job_local1351924025_0001/job_local1351924025_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-07-22 20:03:11,880 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2016-07-22 20:03:13,437 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2016-07-22 20:03:13,448 INFO  solr.SolrUtils - Authenticating as: f4a73627-777b-4d13-af60-df67be41ecb5
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: content dest: content
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: title dest: title
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: host dest: host
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: url dest: url
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: segment dest: segment
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: boost dest: boost
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: digest dest: digest
2016-07-22 20:03:13,673 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2016-07-22 20:03:14,605 INFO  solr.SolrUtils - Authenticating as: f4a73627-777b-4d13-af60-df67be41ecb5
2016-07-22 20:03:14,613 INFO  solr.SolrMappingReader - source: content dest: content
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: title dest: title
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: host dest: host
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: url dest: url
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: segment dest: segment
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: boost dest: boost
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: digest dest: digest
2016-07-22 20:03:14,614 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2016-07-22 20:03:15,685 INFO  indexer.IndexingJob - Indexer: number of documents indexed, deleted, or skipped:
2016-07-22 20:03:15,695 INFO  indexer.IndexingJob - Indexer: finished at            2016-07-22 20:03:15, elapsed: 00:00:06

有什么想法,我该如何解决这个问题并让它索引我的数据?该 URL 用于 Bluemix Retrieve and Rank Service,但它是在 Apache Solr 之上构建的,所以我猜只要我的 Nutch 和 Solr 的 Schema 匹配,我就可以使用它。正确的?

4

0 回答 0