2

我在 Eclipse 中运行的 MacOS 10.11.5 (El Capitan) 上配置了一个本地 Nutch 2.3.1 实例,如下所述:https ://wiki.apache.org/nutch/RunNutchInEclipse

作为要使用的数据存储,我配置了 MongoDB 2.6.12,它也在我的本地 MacOS 机器上运行。我从这里获取了 Gora 配置:http ://www.aossama.com/search-engine-with-apache-nutch-mongodb-and-elasticsearch/

常春藤.xml

<dependency org="org.apache.gora" name="gora-mongodb" rev="0.6.1" conf="*->default" />

gora.properties

gora.datastore.default=org.apache.gora.mongodb.store.MongoStore
gora.mongodb.override_hadoop_configuration=false
gora.mongodb.mapping.file=/gora-mongodb-mapping.xml
gora.mongodb.servers=localhost:27017
# I tried several server settings like localhost, 127.0.0.1, 127.0.0.1:27017, ...
gora.mongodb.db=nutch

我没有更改gora-mongodb-mapping.xml

nutch-site.xml

<property>
 <name>storage.data.store.class</name>
 <value>org.apache.gora.mongodb.store.MongoStore</value>
 <description>Default class for storing data</description>
</property>

如果我运行注入命令,hadoop.log 会显示这个令人困惑的结果:

2016-07-12 23:23:16,818 INFO  crawl.InjectorJob - InjectorJob: starting at 2016-07-12 23:23:16
2016-07-12 23:23:16,819 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: /Users/myaccount/Documents/Nutch/urls
2016-07-12 23:23:17,054 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-12 23:23:17,416 ERROR store.MongoStore - 
2016-07-12 23:23:17,417 ERROR store.MongoStore - [Ljava.lang.StackTraceElement;@4b5189ac
2016-07-12 23:23:17,418 ERROR store.MongoStore - Error while initializing MongoDB store: java.lang.NullPointerException
2016-07-12 23:23:17,419 ERROR crawl.InjectorJob - InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
    at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:267)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:290)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:299)
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:131)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 7 more
Caused by: java.io.IOException: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoMappingBuilder.fromFile(MongoMappingBuilder.java:123)
    at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:118)
    ... 9 more
Caused by: java.lang.NullPointerException
    at org.apache.gora.mongodb.store.MongoMapping.newDocumentField(MongoMapping.java:109)
    at org.apache.gora.mongodb.store.MongoMapping.addClassField(MongoMapping.java:169)
    at org.apache.gora.mongodb.store.MongoMappingBuilder.loadPersistentClass(MongoMappingBuilder.java:169)
    at org.apache.gora.mongodb.store.MongoMappingBuilder.fromFile(MongoMappingBuilder.java:112)
    ... 10 more

两天后,我的想法已经用完了。

在日志文件中,我无法识别任何有价值的提示。MongoDB 日志不显示任何连接尝试(更不用说活动连接)。使用mongo我能够连接到数据库并请求http://localhost:27017会显示预期的消息(“看起来您正试图在本机驱动程序端口上通过 HTTP 访问 MongoDB。”)和相应的日志文件条目。如果我将数据存储切换到 Cassandra,注入会按预期工作,因此 Nutch 本身似乎也可以工作。

有人知道我错过了什么或了解 hadoop.log 试图告诉我的内容吗?

任何帮助,将不胜感激!谢谢。

更新:我还尝试在 Ubuntu 14.04 服务器上使用此配置 - 按预期工作。所以我想我的问题与在 Mac 上运行的 Nutch 和 MongoDB 之间的连接有关。(如果有人想知道:我正在尝试让配置在我的 Mac 上运行,因为我想做一些不需要服务器连接的本地开发。)

4

0 回答 0