我是 hive 的新手,所以有一个基本问题:如何创建一个查询,以便该查询的结果以特定方式分区?
例如:
CREATE TABLE IF NOT EXISTS tbl_x (
x SMALLINT,
y FLOAT)
PARTITIONED BY (id SMALLINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;
INSERT INTO TABLE `tbl_x`
VALUES (1, 1, 1.0),
(1, 1, 2.0),
(1, 2, 3.0),
(1, 2, 4.0),
(2, 1, 5.0),
(2, 1, 6.0),
(2, 2, 7.0),
(2, 2, 8.0);
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x`;
在那个例子中,我希望 tbl_y 也被分区。
尝试这个不起作用:
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x` PARTITIONED BY (id SMALLINT);
这里的诀窍是什么?我应该先定义分区表并将结果插入吗?