我对 Giraph 比较陌生,我正在尝试让我的 Giraph 编辑-编译-部署循环为我们的代码工作。我能够运行受http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/启发的各种示例,但我坚持运行我修改后的 SimpleShortestPathsVertex Giraph 示例时出现 ClassNotFoundException。我尝试了 -libjars 和 HADOOP_CLASSPATH 的各种组合,但我没有想法,非常感谢您的帮助。详情如下。
版本
- Hadoop:Hadoop 2.0.0-cdh4.4.0
- Giraph:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
PageRankBenchmark 运行正常
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1
...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)
GiraphRunner SimpleShortestPathsVertex 也运行良好
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)
奖励:结果是正确的:
$ hadoop fs -cat goutput/shortestpathsC2/p*
0 1.0
2 2.0
1 0.0
3 1.0
4 5.0
但是我修改后的 SimpleShortestPathsVertex 得到 ClassNotFoundException
包含修改后顶点的jar(KdlSimpleShortestPathsVertex,无包)就OK了:
$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/
但我的跑步呕吐:
$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1
Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
我最好的猜测...
...环顾四周后,可能 GiraphRunner 没有正确处理 -libjars,正如http://grepalex.com/2013/02/25/hadoop-libjars/所暗示的那样(“确保您的代码正在使用 GenericOptionsParser” )。浏览 Giraph 源代码,我没有看到该类被访问。我尝试将 HADOOP_CLASSPATH 设置为我的 jar,但这并没有解决问题。
任何帮助都是极好的!
PageRankBenchmark 输出
14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient: File System Counters
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient: Job Counters
14/08/01 11:42:44 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient: Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient: Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient: CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient: Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient: Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient: Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient: Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient: Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient: Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient: Total (milliseconds)=3442
SimpleShortestPathsVertex 输出
14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient: File System Counters
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient: Job Counters
14/08/01 11:47:46 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient: Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient: Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient: CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient: Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient: Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient: Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient: Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient: Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient: Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient: Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient: Total (milliseconds)=805