我正在尝试以spark-submit
Scala 集群模式运行应用程序。它在 PySpark 中运行良好,但在尝试使用 Scala 运行时,弹出上述错误。如果我必须添加 SBT 和 Maven 依赖项,您能否详细说明该过程(我无法在 Google 中找到)
这是我的代码:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object MyFirst {
def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
// get threshold
val threshold = args(1).toInt
// read in text file and split each document into words
val tokenized = sc.textFile(args(0)).flatMap(_.split(" "))
// count the occurrence of each word
val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)
// filter out words with fewer than threshold occurrences
val filtered = wordCounts.filter(_._2 >= threshold)
// count characters
val charCounts = filtered.flatMap(_._1.toCharArray).map((_, 1)).reduceByKey(_ + _)
System.out.println(charCounts.collect().mkString(", "))
}
}
这是我的 Build.sbt
name := "MyFirst"
scalaVersion := "2.10.3"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
我的 Spark 提交是: spark-submit MyFirst --class MyFirst /home/ram/Downloads/sbt/src/target/scala-2.10/MyFirst_2.10-0.1.0-SNAPSHOT.jar