我正在尝试将我的 RStudio 服务器连接到我的 DSE Analytics 集群。
设置:
- CentOS 7
- openjdk-1.8
- RStudio Server v1.0.136(最新版 sparklyr by
>devtools::install_github("rstudio/sparklyr")
) - DSE 5.0(火花 1.6.2)
- 集群内一个 DC 中的 5 个 DSE Analytics 节点(由另一个 DC 共享用于 OLTP)
- 独立运行 DSE Analytics (VM) 的 RStudio Server
因为,与sparklyr 教程不同,我带来了我自己的(DSE 的)Spark。SPARK_HOME
没有设置。也不是JAVA_HOME
。所以:
> Sys.setenv(JAVA_HOME = '/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64')
> Sys.setenv(SPARK_HOME = '/usr/share/dse/spark/')
我的 config.yml (在这里找到了例子):
spark.cassandra.connection.host: <IP of one node>
spark.cassandra.auth.username: cassandra
spark.cassandra.auth.password: <PW>
sparklyr.defaultPackages:
- com.databricks:spark-csv_2.11:1.3.0
- com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M1
- com.datastax.cassandra:cassandra-driver-core:3.0.2
我的会话信息:
> devtools::session_info()
Session info --------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, linux-gnu
ui RStudio (1.0.136)
language (EN)
collate en_US.UTF-8
tz America/Mexico_City
date 2017-02-02
Packages ----------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.2)
backports 1.0.5 2017-01-18 CRAN (R 3.3.2)
base64enc 0.1-3 2015-07-28 CRAN (R 3.3.2)
config 0.2 2016-08-02 CRAN (R 3.3.2)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.2)
devtools 1.12.0 2016-12-05 CRAN (R 3.3.2)
digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
dplyr 0.5.0 2016-06-24 CRAN (R 3.3.2)
git2r 0.18.0 2017-01-01 CRAN (R 3.3.2)
htmltools 0.3.5 2016-03-21 cran (@0.3.5)
httpuv 1.3.3 2015-08-04 cran (@1.3.3)
httr 1.2.1 2016-07-03 CRAN (R 3.3.2)
jsonlite 1.2 2016-12-31 CRAN (R 3.3.2)
magrittr 1.5 2014-11-22 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.2)
mime 0.5 2016-07-07 CRAN (R 3.3.2)
packrat 0.4.8-1 2016-09-07 CRAN (R 3.3.2)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.9 2017-01-14 CRAN (R 3.3.2)
rprojroot 1.2 2017-01-16 CRAN (R 3.3.2)
rstudioapi 0.6 2016-06-27 CRAN (R 3.3.2)
shiny 1.0.0 2017-01-12 cran (@1.0.0)
sparklyr * 0.5.3-9000 2017-02-02 Github (rstudio/sparklyr@bd4aee0)
tibble 1.2 2016-08-26 CRAN (R 3.3.2)
withr 1.0.2 2016-06-20 CRAN (R 3.3.2)
xtable 1.8-2 2016-02-05 cran (@1.8-2)
yaml 2.1.14 2016-11-12 CRAN (R 3.3.2)
现在,当我尝试生成火花上下文时,这就是我得到的:
> sc <- spark_connect(master = "spark://<IP of one node>", config = spark_config(file = "config.yml"), version = "1.6.2")
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (646): Gateway in port (8880) did not respond.
Path: /usr/share/dse/spark/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/spark-csv_2.11-1.3.0.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/commons-csv-1.1.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/univocity-parsers-1.5.1.jar', '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 646
---- Output Log ----
Failed to find Spark assembly in /usr/share/dse/spark/lib.
You need to build Spark before running this program.
---- Error Log ----
从这个输出中,我的猜测是 sparklyr 没有识别DSE Analytics
. 据我了解,DSE 的 spark 它通过连接器与 Cassandra 深度集成,甚至还有自己的dse spark-submit
. 我确定我将错误的配置传递给 sparklyr。我只是迷失了传递给它的东西。欢迎任何帮助。谢谢你。
编辑:我显然遇到了同样的错误> sc <- spark_connect(master="local")