我有以下数据湖的数据集,它充当 Dimension 的源,我想在其中迁移 Dimension 中的历史数据
例如:图像
Primarykey Checksum DateFrom Dateto ActiveFlag
1 11 01:00 03:00 False
1 22 03:00 05:00 False
1 22 05:00 07:00 False
1 11 07:00 09:00 False
1 11 09:00 12/31/999 TRUE
请注意,该datalake
表有多个不属于维度的列,因此我们正在重新计算检查显示相同的值,但datefrom
和dateto
with base as (
Select
Primary_key,
checksum,
first_value ( datefrom ) over ( partition by Primary_key ,checksum order by datefrom ) as Datefrom,
last_value ( dateto ) over ( partition by Primary_key ,checksum order by datefrom ) as Dateto,
rownumber () over ( partition by Primary_key ,checksum order by datefrom ) as latest_record
from Datalake.user)
select * from base where latest_record = 1
数据显示为
Primarykey Checksum DateFrom Dateto
1 11 01:00 12/31/999
1 22 03:00 07:00
但预期结果是
Primarykey Checksum DateFrom Dateto
1 11 01:00 03:00
1 22 03:00 07:00
1 11 07:00 12/31/999
我尝试在单个查询中使用多种方式,但有什么好的建议吗?