apache-spark - 为什么在使用非默认数据库中的表时 insertInto 会失败？

Question

我正在使用 Spark 1.4.0 (PySpark)。我使用此查询从 Hive 表加载了一个 DataFrame：

sqlContext = HiveContext(sc)
table1_contents = sqlContext.sql("SELECT * FROM my_db.table1")

当我尝试table1_contents使用 DataFrameWriter#insertInto 函数将数据从一些转换后插入到 table2 中时：

sqlContext.createDataFrame(transformed_data_from_table1).write.insertInto('my_db.table2')

我遇到这个错误：

py4j.protocol.Py4JJavaError: An error occurred while calling o364.insertInto.
: org.apache.spark.sql.AnalysisException: no such table my_db.table2;

我知道我的表是存在的，因为当我输入时：

print sqlContext.tableNames('my_db')

显示 table1 和 table2。任何人都可以帮助解决这个问题吗？

score 3 · Accepted Answer

我有类似的问题。看起来 insertInto 函数在写入非默认数据库时可能有一些错误。在我将目标表更改为默认数据库后，它工作正常。

score 1 · Accepted Answer

你好，不知道你的问题解决了没有。在我的工作中，我遇到了类似的问题并解决了。我的 spark 版本是 1.40，所以我认为程序 @Ton Torres 没有错误。问题是您使用了 sqlContext 而不是 hiveContext。当您需要操作 hive 时，您最好使用 hiveContext 像这样创建 DataFrame

    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    dfresult = hiveContext.createDataFrame(temp,structType)
    hiveContext.sql("use default")
    dtResult.write.insertInto("tablename")

祝你好运

score 1 · Accepted Answer

这是一个报告的错误。显然，该问题仅在即将发布的 1.6.0 版本中得到解决。

作为一种解决方法，您可以按照您所说的去做，或者使用@guoxian 提到的默认数据库。您也可以试用 1.6.0-SNAPSHOT 版本。

编辑：我链接的 JIRA 问题是针对 Spark Scala 版本的，所以我不能说这个问题是否在 PySpark v 1.6.0 中得到修复。对困惑感到抱歉。

score 0 · Accepted Answer

我没能做到

sqlContext.createDataFrame(transformed_data_from_table1).write.insertInto('my_db.table2')

但是，SparkSQL 似乎支持将INSERT语句作为字符串。

sqlContext.sql("INSERT INTO TABLE my_db.table2...");

这个有效。

尽管我仍然期待我最初的问题得到解答和工作的时间（希望在 Spark 的未来版本中，如果这是一个错误）。

apache-spark - 为什么在使用非默认数据库中的表时 insertInto 会失败？

4 回答 4

Related

Reference