r - sparklyr hadoop配置

Question

我很抱歉，这个问题很难完全重现，因为它涉及一个正在运行的 spark 上下文（在下面引用为 sc），但我试图在 sparklyr 中设置一个 hadoopConfiguration，专门用于从 RStudio sparklyr 访问 swift/objectStore 对象作为Spark 对象，但通常用于对 hadoopConfiguration 的 scala 调用。像（scala代码）这样的东西：

sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url","https://identity.open.softlayer.com"/v3/auth/tokens")

其中 sc 是正在运行的 spark 上下文。在 SparkR 中我可以运行（R 代码）

hConf = SparkR:::callJMethod(sc, "hadoopConfiguration") 
SparkR:::callJMethod(hConf, "set", paste("fs.swift.service.keystone.auth.url"), paste("https://identity.open.softlayer.com/v3/auth/tokens",sep=""))

在 sparklyr 我已经尝试了我想到的每一个咒语，但我最好的猜测是（再次 R 代码）

sc %>% invoke("set", paste("fs.swift.service.keystone,auth.url"), paste("https://identity.open.softlayer.com/v3/auth/tokens",sep=""))

但这会导致非详细错误（和不规则拼写）

Error in enc2utf8(value) : argumemt is not a character vector

当然，我尝试以我能想到的各种方式对输入进行编码（自然是 enc2utf8(value) 是第一个，但还有许多其他方式，包括列表和 as.character(as.list(...)) 这似乎是sparklyr 程序员的最爱）。任何建议将不胜感激。我已经梳理了 sparklyr 的源代码，在 sparklyr github 中找不到任何提及 hadoopConfiguration 的内容，所以我担心我在核心配置中遗漏了一些非常基本的东西。我还尝试在 spark_connect() 核心调用中的 config.yml 中传递这些配置，但这是在将“fs.swift.service.keystone.auth.url”设置为 sc$config$s 时工作的。 swift.service.keystone.auth.url 设置，显然未能将这些设置为核心 hadoopConfiguration。

顺便说一句，我使用的是 Spark1.6、scala 2.10、R 3.2.1 和 sparklyr_0.4.19。

score 6 · Accepted Answer

我想通了

set_swift_config <- function(sc){
  #get spark_context
  ctx <- spark_context(sc)

  #set the java spark context
  jsc <- invoke_static(
    sc,
    "org.apache.spark.api.java.JavaSparkContext",
    "fromSparkContext",
    ctx
  )

  #set the swift configs:
  hconf <- jsc %>% invoke("hadoopConfiguration")
  hconf %>% invoke("set","fs.swift.service.keystone.auth.url",
                   "https://identity.open.softlayer.com/v3/auth/tokens" )
}

可以使用set_swift_config(sc).

r - sparklyr hadoop配置

1 回答 1

Related

Reference