2

我有一个带有 Amazon EMR 的 Spark 集群设置,上面安装了 RStudio。我正在尝试通过包 spark-redshift_2.11-0.5.0.jar 将 sparkR 与 Redshift 连接,在此期间我遇到错误无法找到数据源:com.databricks.spark.redshift

我已将 spark-redshift_2.11-0.5.0.jar 放置在所有其他 spark jar 文件所在的位置 /usr/lib/spark/jars 中。我使用 github repo https://github.com/databricks/spark-redshift的“Reading data using R:”部分的代码片段

.libPaths(c(.libPaths(), '/usr/lib/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/usr/lib/spark') 
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="50g"))
sqlContext <- sparkRSQL.init(sc) 
sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="5g",spark.driver.library.path="/usr/lib/spark/jars"))
sc <- sparkR.init(sparkPackages="com.databricks:spark-redshift_2.11:0.5.0")
df <- read.df(NULL,"com.databricks.spark.redshift",tempdir = "s3n://location",dbtable = "schemaname.tablename",url ="redshift://hostname:5439/dbname?user=user&password=pwd")

我希望代码能够从红移中提取数据并将其保存在数据框中。但面临以下问题:

Caused by: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
    ... 36 more
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
4

0 回答 0