我正在尝试第一次尝试从 Scala 代码访问 Glue 目录。
在尝试使用 Maven 构建我的项目时,我已经遇到了一些麻烦(这很有帮助How to setup a local development environment for Scala Spark ETL to run in AWS Glue?)
但现在我试图在 EMR 集群中运行我的代码,我得到了这个 java.lang.NoClassDefFoundError
这是我的代码:
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.{DynamicFrame, DynamicRecord, GlueContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory
import org.apache.spark.sql.functions.{col, month, year}
object JoinAndRelation {
private val logger = LoggerFactory.getLogger(getClass)
def main(sysArgs: Array[String]): Unit = {
//Spark session creation with connection to Glue Catalog
implicit val spark: SparkSession = SparkSession
.builder
.config(new SparkConf().setAppName("TestGlueAccess"))
.getOrCreate()
val sc: SparkContext = spark.sparkContext
val glueContext: GlueContext = new GlueContext(sc)
...
这是错误:
19/02/08 15:35:26 INFO Client:
client token: N/A
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/amazonaws/services/glue/GlueContext
at org.sergio.poc.JoinAndRelation$.main(JoinAndRelation.scala:41)
at org.sergio.poc.JoinAndRelation.main(JoinAndRelation.scala)
我能够使用 Maven 添加glue-assembly.jar作为依赖项来编译它,也尝试添加aws-java-sdk-core但它没有工作......
<dependency> <groupId>com.amazonaws</groupId> <artifactId>glue-assembly</artifactId> <version>1.0</version> <scope>system</scope> <systemPath>${project.basedir}/libs/glue-assembly.jar</systemPath> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> <version>1.11.445</version> </dependency>
最后,这是我用来运行它的命令:
spark-submit --class org.sergio.poc.JoinAndRelation --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 MyFirstScalaMavenProject-1.0-SNAPSHOT.jar
有没有人面临同样的问题?