0

我有一个离线 pyspark 集群(无法访问互联网),我需要在其中安装graphframes库。

我已经从这里手动下载了$SPARK_HOME/jars/ 中添加的 jar,然后当我尝试使用它时,出现以下错误:

error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term typesafe in package com,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term scalalogging in value com.typesafe,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access type LazyLogging in value com.slf4j,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.

使用所有依赖项离线安装它的正确方法是什么?

4

1 回答 1

1

我设法安装了 graphframes 库。首先,我找到了graphframes依赖,其中:

scala-logging-api_xx-xx.jar
scala-logging-slf4j_xx-xx.jar

其中 xx 是 scala 和 jar 版本的正确版本。然后我将它们安装在正确的路径中。因为我在 Cloudera 机器上工作,所以正确的路径是:

/opt/cloudera/parcels/SPARK2/lib/spark2/jars/

如果您不能将它们放在集群中的这个目录中(因为您没有 root 权限并且您的管理员超级懒惰),您可以简单地添加到您的 spark-submit/ spark-shell

spark-submit ..... --driver-class-path /path-for-jar/  \
                   --jars /../graphframes-0.5.0-spark2.1-s_2.11.jar,/../scala-logging-slf4j_2.10-2.1.2.jar,/../scala-logging-api_2.10-2.1.2.jar

这适用于 Scala。为了将graphframes用于python,您需要下载graphframes jar,然后通过shell

#Extract JAR content
 jar xf graphframes_graphframes-0.3.0-spark2.0-s_2.11.jar
#Enter the folder
 cd graphframes
#Zip the contents
 zip graphframes.zip -r *

然后在 spark-env.sh 或 bash_profile 中的 python 路径中添加压缩文件

export PYTHONPATH=$PYTHONPATH:/..proper path/graphframes.zip:.

然后打开外壳/提交(再次使用与 scala 相同的参数)导入图框正常工作

链接对于此解决方案非常有用

于 2018-11-02T10:20:27.580 回答