3

我在一个 sbt 管理的具有spark-cloudant依赖关系的 Spark 项目上工作。代码可在 GitHub (on spark-cloudant-compile-issuebranch)上找到。

我已将以下行添加到build.sbt

"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"

所以build.sbt看起来如下:

name := "Movie Rating"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies ++= {
  val sparkVersion =  "1.6.0"
  Seq(
     "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
     "org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
     "org.apache.kafka" % "kafka-clients" % "0.9.0.0",
     "org.apache.kafka" %% "kafka" % "0.9.0.0",
     "cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
    )
}

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", "spark", xs @ _*) => MergeStrategy.first
  case PathList("scala", xs @ _*) => MergeStrategy.discard
  case PathList("META-INF", "maven", "org.slf4j", xs @ _* ) => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

unmanagedBase <<= baseDirectory { base => base / "lib" }

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

当我执行时,sbt assembly我收到以下错误:

java.lang.RuntimeException: Please add any Spark dependencies by 
   supplying the sparkVersion and sparkComponents. Please remove: 
   org.apache.spark:spark-core:1.6.0:provided
4

2 回答 2

2

可能相关:https ://github.com/databricks/spark-csv/issues/150

您可以尝试添加spIgnoreProvided := true到您的build.sbt?

(这可能不是答案,我本可以发表评论,但我没有足够的声誉)

于 2016-12-03T18:35:15.417 回答
1

注意我仍然无法重现该问题,但认为这并不重要。

java.lang.RuntimeException:请通过提供 sparkVersion 和 sparkComponents 添加任何 Spark 依赖项。

在您的情况下,您build.sbt错过了一个 sbt 解析器来查找spark-cloudant依赖项。您应该将以下行添加到build.sbt

resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

PROTIP我强烈建议您spark-shell首先使用并且仅在您对将包切换到 sbt 感到满意时才使用(尤其是如果您不熟悉 sbt 以及可能还有其他库/依赖项)。一口就消化不了太多了。关注https://spark-packages.org/package/cloudant-labs/spark-cloudant

于 2016-12-06T09:02:00.363 回答