apache-spark - 将 spark 与 power bi 和 tableau 等 BI 工具连接起来

Question

我需要将 spark 与 powerbi 连接起来。我不知道相同的所需驱动程序。而且我在本地模式下运行 spark 而没有安装 apache hive。所以我没有用于配置节俭服务器的hive-site.xml文件。启动 thrift 服务器后，我启动了 $SPARK_HOME\bin\beeline.cmd并使用命令连接了 thrift 服务器，!connect jdbc:hive2://localhost:10000并使用用户 ID 作为管理员（与我的本地计算机相同）和空白密码，输出为：

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: Administrator
Enter password for jdbc:hive2://localhost:10000:
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 2.0.1)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ

似乎建立了连接，但是当使用 command: 查询数据库时show databases;，它显示错误（直线）：

Error: org.apache.thrift.transport.TTransportException: java.net.SocketException: Software caused connection abort: socket write error (state=08S01,code=0)` and error(in thrift server cmd):`Exception in thread "HiveServer2-Handler-Pool: Thread-XXX"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HiveServer2-Handler-Pool: Thread-XXX"

我不明白这个错误。请帮助我，并且我想将它与安装在本地计算机上的 powerbi desktop 连接。有人可以提供一些链接来阅读以建立连接吗？

score 0 · Accepted Answer

@Birla, It looks like a TCP error as mentioned in the question asked here.

It is not recommended to use Thrift in a local machine as Thrift server needs pretty good processing with a dedicated Metastore servers to handle authentication and parallelism.

Recommended : Install Horton Works/Cloudera ready to work VM and then access these from power BI.

apache-spark - 将 spark 与 power bi 和 tableau 等 BI 工具连接起来

1 回答 1

Related

Reference