问题
设置 AWS Glue 库后,我面临以下错误:
PS C:\Users\[user]\Documents\[company]\projects\code\data-lake\etl\tealium> python visitor.py
20/04/05 19:33:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "visitor.py", line 9, in <module>
glueContext = GlueContext(sc.getOrCreate())
File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 45, in __init__
File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 66, in _get_glue_scala_context
TypeError: 'JavaPackage' object is not callable
设想
我正在尝试使用 PIPENV 在虚拟环境中安装 AWS GLue ETL 库。所以我得到了以下带有环境变量的 .env 文件:
HADOOP_HOME="C:\Users\[user]\AppData\Local\Spark\winutils"
SPARK_HOME="C:\Users\[user]\AppData\Local\Spark\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8"
JAVA_HOME="C:\Program Files\Java\jdk1.8.0_231"
PATH="${HADOOP_HOME}\bin"
PATH="${SPARK_HOME}\bin:${PATH}"
PATH="${JAVA_HOME}\bin:${PATH}"
SPARK_CONF_DIR="C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\conf"
PYTHONPATH="${SPARK_HOME}/python/:${PYTHONPATH}"
PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}"
PYTHONPATH="C:/Users/[user]/Documents/[company]/projects/code/aws-glue-libs-glue-1.0/PyGlue.zip:${PYTHONPATH}"
我的代码最初非常简单,我只创建 Glue 上下文,如下所示:
from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pyspark.conf import SparkConf
sc = SparkContext()
glueContext = GlueContext(sc.getOrCreate())
print(glueContext)
print(sc)
你们知道这可能是什么问题吗?