我正在尝试在 SANSA-Stack/SANSA-Examples ( https://github.com/SANSA-Stack/SANSA-Examples )中运行 SilviaClusteringExample
我已经使用具有一个主节点和 3 个工作节点的 GCP DataProc 设置了 Spark 集群。按照给出的说明,通过为--input
&--output
路径指定 Hadoop 文件系统文件路径来运行 spark-submit。
运行以下命令,
spark-submit --class net.sansa_stack.examples.spark.ml.clustering.SilviaClusteringExample --master spark://<masternode_ip>:7077 /home/<user_name>/sansa/SANSA-Examples-develop/sansa-examples-spark/target/sansa-examples-spark_2.11-0.6.1-SNAPSHOT-jar-with-dependencies.jar --input /user/<user_name> --output /user/<user_name>/out.txt
上面的命令返回以下错误,
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.jena.riot.system.RiotLib
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:135)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:118)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance$lzycompute(NTripleReader.scala:207)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance(NTripleReader.scala:207)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.get(NTripleReader.scala:209)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:148)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
Hadoop文件系统下的文件,
hadoop fs -ls -h
Found 2 items
-rw-r--r-- 2 <user_name> hadoop 70.2 K 2019-07-22 06:58 SilviaClustering_HairStylist_TaxiDriver.txt
-rwxr--r-- 2 <user_name> hadoop 0 2019-07-22 07:09 out.txt
请帮助解决上述问题。谢谢。