我正在使用 R-Studio 处理来自 HIVE 的数据。这里我使用 RJDBC。RJDBC 将 select 语句转换为数据帧。不幸的是,似乎无法识别配置单元列数据类型“日期”和“时间戳”的转换。因此它在 dbReadTable(conn, db2.ibor_lending) 期间被转换为字符,这很糟糕。
你对此有什么想法吗?我不想在 R 中再次重铸这个角色,因为它是 1. 开销,2. 导致耦合和 3. 增加维护工作
library(DBI)
library(rJava)
library(RJDBC)
print("Attempting Hive Connection...")
hadoop.class.path = list.files(path=c("/usr/hdp/current/hadoop-client"),pattern="jar", full.names=T);
hadoop.client.lib = list.files(path=c("/usr/hdp/current/hadoop-client/lib"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.hdfs.lib.path = list.files(path=c("/usr/hdp/current/hadoop-hdfs-client"),pattern="jar",full.names=T);
zookeeper.lib.path = list.files(path=c("/usr/hdp/current/zookeeper-client"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client"),pattern="jar",full.names=T);
cp = c(hive.class.path,mapred.class.path,hadoop.class.path,hadoop.client.lib,hadoop.hdfs.lib.path)
.jinit(classpath=cp, parameters="-Djavax.security.auth.useSubjectCredsOnly=false")
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","/usr/hdp/current/hive-client/lib/hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv,"jdbc:hive2://xxx:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@yyyy")
show_databases <- dbGetQuery(conn, "select * from db2.ibor_lending LIMIT 100")
show_datatypes <- dbGetQuery(conn, "describe db2.ibor_lending")
show_table <- dbReadTable(conn, db2.ibor_lending)
结果是:
Hive: col_name data_type comment
cutoffdate timestamp
R dataframe: ibor_lending.cutoffdate character
兄弟,丹尼斯