我正在尝试在 EMR 上使用 Shark,但我似乎无法从位置设置为 S3 存储桶的表中恢复我的分区。当我尝试显示我的分区时,我什么也没得到。
shark> MSCK REPAIR TABLE logs ;
OK
Time taken: 1.79 seconds
shark> SHOW PARTITIONS logs ;
OK
Time taken: 0.073 seconds
我创建我的表
SET hive.exec.dynamic.partition = true ;
SET hive.exec.dynamic.partition.mode = nonstrict ;
CREATE EXTERNAL TABLE IF NOT EXISTS logs (
time STRING,
thread STRING,
logger STRING,
identity STRING,
message STRING,
logtype STRING,
logsubtype STRING,
node STRING,
storageallocationstatus STRING,
nodelist STRING,
userid STRING,
nodeid STRING,
path STRING,
datablockid STRING,
hash STRING,
size STRING,
value STRING,
exception STRING,
server STRING,
app STRING,
version STRING
)
PARTITIONED BY (
dt STRING,
level STRING
)
ROW FORMAT
DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://my-log/parsed-logs/' ;
我的日志存储桶包含一个位于s3://my-log/parsed-logs/dt=2014-01-03/level=ERROR/
.
根据Hive 语言手册,该MSCK REPAIR TABLE logs
命令应该等同于 Amazons Hive 扩展,但是当我运行该命令时,我看不到任何分区。我在 Hive 中尝试了完全相同的东西,它就像一个魅力。ALTER TABLE logs RECOVER PARTITIONS
ALTER TABLE logs RECOVER PARTITIONS
hive> ALTER TABLE logs RECOVER PARTITIONS ;
OK
Time taken: 0.975 seconds
hive> SHOW PARTITIONS logs ;
OK
dt=2014-01-03/level=ERROR
Time taken: 0.078 seconds, Fetched: 1 row(s)
当我使用 Shark 时,我在这里遗漏了什么吗?