1

我有一个 Spark 应用程序,它有一个如下所示的 sbt 文件。
它适用于我的本地机器。但是当我将它提交给运行 Spark 1.6.1 的 EMR 时,出现如下错误:

java.lang.NoClassDefFoundError: net/liftweb/json/JsonAST$JValue

我正在使用“sbt-package”来获取 jar

Build.sbt:

organization := "com.foo"
name := "FooReport"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.1"
  ,"net.liftweb" % "lift-json_2.10" % "2.6.3"
  ,"joda-time" % "joda-time" % "2.9.4"
)

你知道发生了什么吗?

4

1 回答 1

0

我找到了一个解决方案,它正在工作!

问题在于sbt package不包括所有依赖 jar 来输出 jar。为了克服这个问题,我尝试sbt-assembly了,但是当我运行它时出现了很多“重复数据删除”错误。

毕竟,我想到了这篇博客文章,它让一切都变得清晰起来。
http://queirozf.com/entries/creating-scala-fat-jars-for-spark-on-sbt-with-sbt-assembly-plugin

为了将 Spark 作业提交到 Spark 集群(通过 spark-submit),您需要在 Jar 中包含所有依赖项(除了 Spark 本身),否则您将无法在作业中使用它们。

  1. 在 /project 文件夹下创建“assembly.sbt”。
  2. 添加这一行addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
  3. 然后将下面的 assemblyMergeStrategy 代码粘贴到您的build.sbt

assemblyMergeStrategy in assembly := { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last case PathList("javax", "activation", xs @ _*) => MergeStrategy.last case PathList("org", "apache", xs @ _*) => MergeStrategy.last case PathList("com", "google", xs @ _*) => MergeStrategy.last case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last case PathList("com", "codahale", xs @ _*) => MergeStrategy.last case PathList("com", "yammer", xs @ _*) => MergeStrategy.last case "about.html" => MergeStrategy.rename case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last case "META-INF/mailcap" => MergeStrategy.last case "META-INF/mimetypes.default" => MergeStrategy.last case "plugin.properties" => MergeStrategy.last case "log4j.properties" => MergeStrategy.last case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) }

并运行sbt assembly

现在你有一个包含所有依赖项的大胖罐子。基于依赖库,它可能是数百 MB。对于我来说,我使用的是 Aws EMR,它上面已经安装了 Spark 1.6.1。要从您的 jar 中排除 spark-core lib,您可以使用“provided”关键字:

"org.apache.spark" %% "spark-core" % "1.6.1" % "provided"

这是最终的 build.sbt 文件:

organization := "com.foo"
name := "FooReport"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.1" % "provided"
  ,"net.liftweb" % "lift-json_2.10" % "2.6.3"
  ,"joda-time" % "joda-time" % "2.9.4"
)

assemblyMergeStrategy in assembly := {
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("com", "google", xs @ _*) => MergeStrategy.last
  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
  case "about.html" => MergeStrategy.rename
  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
  case "META-INF/mailcap" => MergeStrategy.last
  case "META-INF/mimetypes.default" => MergeStrategy.last
  case "plugin.properties" => MergeStrategy.last
  case "log4j.properties" => MergeStrategy.last
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}
于 2016-07-13T12:04:08.753 回答