sql - Hive Distinct Query 需要更多时间

问问题 2019-10-01T17:05:00.077

171 次

我有分区表，表结构

create table tab1 
(
col1 int,
col2 string,
...
col50 int,
col51 int
)
partitioned by 
(col50 int, col51 int)
stored as orc;

目前我们有大约 17000 个分区，每个分区至少有大约 50k 条记录。

下面的查询需要更多时间 ~ 90Mins

SELECT DISTINCT col2 FROM tab1
select col2 from (select col2, row_number() over (partition by col2 order by col3) as rnk from tab1) t1 where t1.rnk=1

有没有办法可以减少执行时间，提前谢谢

0 回答 0