我有一个蜂巢表
CREATE TABLE beacons
(
foo string,
bar string,
foonotbar string
)
COMMENT "Digest of daily beacons, by day"
PARTITIONED BY ( day string COMMENt "In YYYY-MM-DD format" );
为了填充,我正在做类似的事情:
SET hive.exec.compress.output=True;
SET io.seqfile.compression.type=BLOCK;
INSERT OVERWRITE TABLE beacons PARTITION ( day = "2011-01-26" ) SELECT
someFunc(query, "foo") as foo,
someFunc(query, "bar") as bar,
otherFunc(query, "foo||bar") as foonotbar
)
FROM raw_logs
WHERE day = "2011-01-26";
这将使用通过 deflate 压缩的单个产品构建一个新分区,但这里的理想情况是通过 LZO 压缩编解码器。
不幸的是,我不确定如何实现这一点,但我认为它是众多运行时设置之一,或者可能只是 CREATE TABLE DDL 中的附加行。