我一直在针对大小为 56GB(789700760 行)的表运行以下查询,并且遇到了执行时间的瓶颈。从之前的一些示例中,我认为可能有一种方法可以“取消嵌套”INNER JOIN,以便查询对大型数据集执行得更好。特别是下面的查询在 MPP PostgreSQL 部署上完成执行需要 7.651 小时。
create table large_table as
select column1, column2, column3, column4, column5, column6
from
(
select
a.column1, a.column2, a.start_time,
rank() OVER(
PARTITION BY a.column2, a.column1 order by a.start_time DESC
) as rank,
last_value( a.column3) OVER (
PARTITION BY a.column2, a.column1 order by a.start_time ASC
RANGE BETWEEN unbounded preceding and unbounded following
) as column3,
a.column4, a.column5, a.column6
from
(table2 s
INNER JOIN table3 t
ON s.column2=t.column2 and s.event_time > t.start_time
) a
) b
where rank =1;
问题一:有没有办法修改上面的sql代码来加快查询的整体执行时间?