我开始做一个火花流工作,并为 kinesis 端点找了一个制作人。完成这项工作后,我开始制作消费者,但在构建它时遇到了问题。
我正在使用程序集插件来创建一个包含所有依赖项的单个 jar。项目的依赖如下。
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-sql" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.1",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"c3p0" % "c3p0" % "0.9.1.+",
"com.amazonaws" % "aws-java-sdk" % "1.10.4.1",
"mysql" % "mysql-connector-java" % "5.1.33",
"com.amazonaws" % "amazon-kinesis-client" % "1.5.0"
)
当我运行程序集时,文件可以编译,但在合并阶段失败并出现错误
[error] (streamingClicks/*:assembly) deduplicate: different file contents found in the following:
[error] /Users/adam/.ivy2/cache/org.apache.spark/spark-network-common_2.10/jars/spark-network-common_2.10-1.4.1.jar:META-INF/maven/com.google.guava/guava/pom.properties
[error] /Users/adam/.ivy2/cache/com.google.guava/guava/bundles/guava-18.0.jar:META-INF/maven/com.google.guava/guava/pom.properties
这是在添加 spark-streaming-kinesis-asl 依赖项时引起的。我该如何解决这个问题?我可以将依赖项标记为已提供,然后将 jar 添加到类路径中,但这真的不是我想做的事情。