1

我正在使用 apache toree(来自 github 的版本)。当我尝试对 postgresql 表执行查询时,我遇到了间歇性的 scala 编译器错误(当我两次运行相同的单元格时,错误消失并且代码运行正常)。

我正在寻找有关如何调试这些错误的建议。这些错误看起来很奇怪(它们出现在标准输出的笔记本 nog 中)。

error: missing or invalid dependency detected while loading class file 'QualifiedTableName.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'QualifiedTableName.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'FunctionIdentifier.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'FunctionIdentifier.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'DefinedByConstructorParams.class'.
...

代码很简单:从 postgres 表中提取数据集:

%AddDeps org.postgresql postgresql 42.1.4 --transitive
val props = new java.util.Properties();
props.setProperty("driver","org.postgresql.Driver");
val df = spark.read.jdbc(url = "jdbc:postgresql://postgresql/database?user=user&password=password", 
                table = "table", predicates = Array("1=1"), connectionProperties = props)
df.show()

我检查了明显的(toree 和 apache spark 都使用 scala 2.11.8,我用 APACHE_SPARK_VERSION=2.2.0 构建了 apache toree,这与我下载的 spark 相同)

作为参考,这是我用来设置 toree 和 spark 的 Dockerfile 的一部分:

RUN wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz && tar -zxf spark-2.2.0-bin-hadoop2.7.tgz && chmod -R og+rw /opt/spark-2.2.0-bin-hadoop2.7 && chown -R a1414.a1414 /opt/spark-2.2.0-bin-hadoop2.7
RUN (curl https://bintray.com/sbt/rpm/rpm > /etc/yum.repos.d/bintray-sbt-rpm.repo)
RUN yum -y install --nogpgcheck sbt
RUN (unset http_proxy; unset https_proxy; yum -y install --nogpgcheck java-1.8.0-openjdk-devel.i686)
RUN (git clone https://github.com/apache/incubator-toree && cd incubator-toree && make clean release APACHE_SPARK_VERSION=2.2.0 ; exit 0)
RUN (. /opt/rh/rh-python35/enable; cd /opt/incubator-toree/dist/toree-pip ;python setup.py install)
RUN (. /opt/rh/rh-python35/enable; jupyter toree install --spark_home=/opt/spark-2.2.0-bin-hadoop2.7 --interpreters=Scala)
4

2 回答 2

1

我有一个类似的问题,但它似乎通过仅重新评估 Jupyter 笔记本中的单元格,或者通过重新启动内核然后重新评估单元格来解决。恼人的。

于 2018-09-24T15:08:34.503 回答
0

正如 cchantep 的评论中所说,您使用的 Scala 版本可能与用于构建 Spark 的版本不同。

最简单的解决方案是检查 Spark 使用的是哪一个,然后切换到这个,例如在 Mac 上:

brew switch scala 2.11.8
于 2018-03-03T11:38:43.400 回答