2

我正在尝试第一次尝试从 Scala 代码访问 Glue 目录。

在尝试使用 Maven 构建我的项目时,我已经遇到了一些麻烦(这很有帮助How to setup a local development environment for Scala Spark ETL to run in AWS Glue?

但现在我试图在 EMR 集群中运行我的代码,我得到了这个 java.lang.NoClassDefFoundError

这是我的代码:

import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.{DynamicFrame, DynamicRecord, GlueContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory
import org.apache.spark.sql.functions.{col, month, year}

object JoinAndRelation {

  private val logger = LoggerFactory.getLogger(getClass)

  def main(sysArgs: Array[String]): Unit = {
    //Spark session creation with connection to Glue Catalog
    implicit val spark: SparkSession = SparkSession
      .builder
      .config(new SparkConf().setAppName("TestGlueAccess"))
      .getOrCreate()
        val sc: SparkContext = spark.sparkContext
        val glueContext: GlueContext = new GlueContext(sc)
...

这是错误:

19/02/08 15:35:26 INFO Client: 
     client token: N/A
     diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/amazonaws/services/glue/GlueContext
    at org.sergio.poc.JoinAndRelation$.main(JoinAndRelation.scala:41)
    at org.sergio.poc.JoinAndRelation.main(JoinAndRelation.scala)

我能够使用 Maven 添加glue-assembly.jar作为依赖项来编译它,也尝试添加aws-java-sdk-core但它没有工作......

<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>glue-assembly</artifactId>
  <version>1.0</version>
  <scope>system</scope>
  <systemPath>${project.basedir}/libs/glue-assembly.jar</systemPath>
</dependency>
<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>aws-java-sdk-core</artifactId>
  <version>1.11.445</version>
</dependency>

最后,这是我用来运行它的命令:

spark-submit --class org.sergio.poc.JoinAndRelation --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 MyFirstScalaMavenProject-1.0-SNAPSHOT.jar

有没有人面临同样的问题?

4

0 回答 0