1

我正在尝试从在 IBM Analytics Engine 上运行的 Spark 2.3 连接到在 IBM Cloud 上运行的 ScyllaDB 数据库。

我正在像这样启动火花壳......

$ spark-shell --master local[1] \
       --files jaas.conf \
       --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0,datastax:spark-cassandra-connector:2.3.0-s_2.11,commons-configuration:commons-configuration:1.10 \
       --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" \
       --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" \
       --conf spark.cassandra.connection.host=xxx1.composedb.com,xxx2.composedb.com,xxx3.composedb.com \
       --conf spark.cassandra.connection.port=28730 \
       --conf spark.cassandra.auth.username=scylla \
       --conf spark.cassandra.auth.password=SECRET \
       --conf spark.cassandra.connection.ssl.enabled=true \
       --num-executors 1  \
       --executor-cores 1 

然后执行以下 spark scala 代码:

import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._

val stocksRdd = sc.cassandraTable("stocks", "stocks")

stocksRdd.count()

但是,我看到一堆警告:

18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:06 WARN Session: Error creating pool to /xxx.xxx.xxx.xxx:28730
com.datastax.driver.core.exceptions.ConnectionException: [/xxx.xxx.xxx.xxx:28730] Pool was closed during initialization
...

但是,在警告中的堆栈跟踪之后,我看到了我期望的输出:

res2: Long = 4 

如果我导航到撰写 UI,我会看到一个地图 json:

[
  {"xxx.xxx.xxx.xxx:9042":"xxx1.composedb.com:28730"},
  {"xxx.xxx.xxx.xxx:9042":"xxx2.composedb.com:28730"},
  {"xxx.xxx.xxx.xxx:9042":"xxx3.composedb.com:28730"}
]

似乎警告与地图文件有关。

警告的含义是什么?我可以忽略它吗?


注意:我看到了一个类似的问题,但是我相信这个问题是不同的,因为地图文件和我无法控制 Compose 如何设置 scylladb 集群。

4

1 回答 1

6

This is just warning. The warning is happening because the IPs that spark is trying to reach are not know to Scylla itself. Apparently Spark is connecting to the cluster and retrieving the expected information so you should be fine.

于 2018-08-23T19:29:30.827 回答