1

我正在尝试将我的 RStudio 服务器连接到我的 DSE Analytics 集群。

设置:

  • CentOS 7
  • openjdk-1.8
  • RStudio Server v1.0.136(最新版 sparklyr by >devtools::install_github("rstudio/sparklyr")
  • DSE 5.0(火花 1.6.2)
  • 集群内一个 DC 中的 5 个 DSE Analytics 节点(由另一个 DC 共享用于 OLTP)
  • 独立运行 DSE Analytics (VM) 的 RStudio Server

因为,与sparklyr 教程不同,我带来了我自己的(DSE 的)Spark。SPARK_HOME没有设置。也不是JAVA_HOME。所以:

> Sys.setenv(JAVA_HOME = '/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64')  
> Sys.setenv(SPARK_HOME = '/usr/share/dse/spark/')

我的 config.yml (在这里找到了例子):

spark.cassandra.connection.host: <IP of one node>
spark.cassandra.auth.username: cassandra
spark.cassandra.auth.password: <PW>

sparklyr.defaultPackages:
- com.databricks:spark-csv_2.11:1.3.0
- com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M1
- com.datastax.cassandra:cassandra-driver-core:3.0.2

我的会话信息:

> devtools::session_info()
Session info --------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, linux-gnu           
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Mexico_City         
 date     2017-02-02                  

Packages ----------------------------------------
 package    * version    date       source                           
 assertthat   0.1        2013-12-06 CRAN (R 3.3.2)                   
 backports    1.0.5      2017-01-18 CRAN (R 3.3.2)                   
 base64enc    0.1-3      2015-07-28 CRAN (R 3.3.2)                   
 config       0.2        2016-08-02 CRAN (R 3.3.2)                   
 curl         2.3        2016-11-24 CRAN (R 3.3.2)                   
 DBI          0.5-1      2016-09-10 CRAN (R 3.3.2)                   
 devtools     1.12.0     2016-12-05 CRAN (R 3.3.2)                   
 digest       0.6.12     2017-01-27 CRAN (R 3.3.2)                   
 dplyr        0.5.0      2016-06-24 CRAN (R 3.3.2)                   
 git2r        0.18.0     2017-01-01 CRAN (R 3.3.2)                   
 htmltools    0.3.5      2016-03-21 cran (@0.3.5)                    
 httpuv       1.3.3      2015-08-04 cran (@1.3.3)                    
 httr         1.2.1      2016-07-03 CRAN (R 3.3.2)                   
 jsonlite     1.2        2016-12-31 CRAN (R 3.3.2)                   
 magrittr     1.5        2014-11-22 CRAN (R 3.3.2)                   
 memoise      1.0.0      2016-01-29 CRAN (R 3.3.2)                   
 mime         0.5        2016-07-07 CRAN (R 3.3.2)                   
 packrat      0.4.8-1    2016-09-07 CRAN (R 3.3.2)                   
 R6           2.2.0      2016-10-05 CRAN (R 3.3.2)                   
 Rcpp         0.12.9     2017-01-14 CRAN (R 3.3.2)                   
 rprojroot    1.2        2017-01-16 CRAN (R 3.3.2)                   
 rstudioapi   0.6        2016-06-27 CRAN (R 3.3.2)                   
 shiny        1.0.0      2017-01-12 cran (@1.0.0)                    
 sparklyr   * 0.5.3-9000 2017-02-02 Github (rstudio/sparklyr@bd4aee0)
 tibble       1.2        2016-08-26 CRAN (R 3.3.2)                   
 withr        1.0.2      2016-06-20 CRAN (R 3.3.2)                   
 xtable       1.8-2      2016-02-05 cran (@1.8-2)                    
 yaml         2.1.14     2016-11-12 CRAN (R 3.3.2)  

现在,当我尝试生成火花上下文时,这就是我得到的:

> sc <- spark_connect(master = "spark://<IP of one node>", config = spark_config(file = "config.yml"), version = "1.6.2")  
Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (646): Gateway in port (8880) did not respond.
    Path: /usr/share/dse/spark/bin/spark-submit
    Parameters: --class, sparklyr.Backend, --jars, '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/spark-csv_2.11-1.3.0.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/commons-csv-1.1.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/univocity-parsers-1.5.1.jar', '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 646


---- Output Log ----
Failed to find Spark assembly in /usr/share/dse/spark/lib.
You need to build Spark before running this program.

---- Error Log ----

从这个输出中,我的猜测是 sparklyr 没有识别DSE Analytics. 据我了解,DSE 的 spark 它通过连接器与 Cassandra 深度集成,甚至还有自己的dse spark-submit. 我确定我将错误的配置传递给 sparklyr。我只是迷失了传递给它的东西。欢迎任何帮助。谢谢你。

编辑:我显然遇到了同样的错误> sc <- spark_connect(master="local")

4

0 回答 0