我设置了一个 Amazon ElasticMapreduce 作业来运行配置单元查询
CREATE EXTERNAL TABLE output_dailies (
day string, type string, subType string, product string, productDetails string,
uniqueUsers int, totalUsers int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '${OUTPUT}';
INSERT OVERWRITE TABLE output_dailies
select day, type, subType, product, productDetails, count(distinct accountId) as uniqueUsers, count(accountId) as totalUsers from raw_logs where day = '${QUERY_DATE}' group by day, type, subType, product, productDetails;
作业完成后,配置为在 S3 上的输出位置将包含 5 个具有此模式的文件,task_201110280815_0001_r_00000x
其中 x 从 0 变为 4。这些文件很小,每个 35 KB。
是否可以指示 hive 将结果存储在单个文件中?