scala - Uisng Hive 上下文，在本地系统 metastore_db 中本地创建 Hive 表，而不是在集群上，放置我的 hive-site.xml

Question

我已经创建了一个 Spark Context 对象，并尝试从 hadoop 服务器（不在我的本地）上的文本文件中检索文本，并且我能够检索它。

当我试图检索 Hive 表（它位于独立机器、集群上）时，我无法做到，当我创建一个 hive 表时，它会在 metastore_db 本地创建

objHiveContext.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) 存储为orc")

我尝试设置元存储

objHiveContext.setConf("hive.metastore.warehouse.dir", "hdfs://ServerIP:HiveportNum/apps/hive/warehouse")

&& 还有 objHiveContext.hql("SET hive.metastore.warehouse.dir=hdfs://serverIp:portNumber/apps/hive/warehouse")

我什至将 hive-site xml 放在 spark machine conf 文件夹中，

如何让我的 scala 应用程序联系 hive-site.xml 并从该 xml 获取元存储信息以及我应该将我的 Hive-site.xml 放在哪里

我已将它放在我的应用程序中，因为建议在 ClassPath 中添加任何地方，我添加并可以看到 mypom.xml 文件上方，但我的 scala 应用程序仍处于本地模式

表（yahoo_orc_table）在 D:\user\hive\warehouse 本地创建

score 1 · Accepted Answer

它应该在的唯一位置是在 spark conf 目录中。如果你把它放在那里但仍然没有工作，这意味着问题出在其他地方，可能在 hive-site.xml 的内容中。

score 0 · Accepted Answer

此问题已在 spark2 上解决，将 hive-site xml 文件放入 spark machine conf 文件夹后，您可以使用：

  import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.master("local[2]")
.appName("interfacing spark sql to hive metastore without configuration file")
.config("hive.metastore.uris", "thrift://host:port") // replace with your hivemetastore service's thrift url
.enableHiveSupport() // don't forget to enable hive support
.getOrCreate()

spark.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc")

此代码在集群上的 hive 中创建表“yahoo_orc_table”。

scala - Uisng Hive 上下文，在本地系统 metastore_db 中本地创建 Hive 表，而不是在集群上，放置我的 hive-site.xml

2 回答 2

Related

Reference