我正在使用 apache toree(来自 github 的版本)。当我尝试对 postgresql 表执行查询时,我遇到了间歇性的 scala 编译器错误(当我两次运行相同的单元格时,错误消失并且代码运行正常)。
我正在寻找有关如何调试这些错误的建议。这些错误看起来很奇怪(它们出现在标准输出的笔记本 nog 中)。
error: missing or invalid dependency detected while loading class file 'QualifiedTableName.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'QualifiedTableName.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'FunctionIdentifier.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'FunctionIdentifier.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'DefinedByConstructorParams.class'.
...
代码很简单:从 postgres 表中提取数据集:
%AddDeps org.postgresql postgresql 42.1.4 --transitive
val props = new java.util.Properties();
props.setProperty("driver","org.postgresql.Driver");
val df = spark.read.jdbc(url = "jdbc:postgresql://postgresql/database?user=user&password=password",
table = "table", predicates = Array("1=1"), connectionProperties = props)
df.show()
我检查了明显的(toree 和 apache spark 都使用 scala 2.11.8,我用 APACHE_SPARK_VERSION=2.2.0 构建了 apache toree,这与我下载的 spark 相同)
作为参考,这是我用来设置 toree 和 spark 的 Dockerfile 的一部分:
RUN wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz && tar -zxf spark-2.2.0-bin-hadoop2.7.tgz && chmod -R og+rw /opt/spark-2.2.0-bin-hadoop2.7 && chown -R a1414.a1414 /opt/spark-2.2.0-bin-hadoop2.7
RUN (curl https://bintray.com/sbt/rpm/rpm > /etc/yum.repos.d/bintray-sbt-rpm.repo)
RUN yum -y install --nogpgcheck sbt
RUN (unset http_proxy; unset https_proxy; yum -y install --nogpgcheck java-1.8.0-openjdk-devel.i686)
RUN (git clone https://github.com/apache/incubator-toree && cd incubator-toree && make clean release APACHE_SPARK_VERSION=2.2.0 ; exit 0)
RUN (. /opt/rh/rh-python35/enable; cd /opt/incubator-toree/dist/toree-pip ;python setup.py install)
RUN (. /opt/rh/rh-python35/enable; jupyter toree install --spark_home=/opt/spark-2.2.0-bin-hadoop2.7 --interpreters=Scala)