我在雪花数据仓库中的源数据存在“空白”,即缺失天数,我的要求是用前一行中的数据填充缺失的数据。所以我创建了这个虚拟项目来练习使用完整的外部连接和窗口功能FIRST_VALUE。
我的样本数据有两张表:一张叫prev_test,一张叫dim_date
create TABLE prev_test (
SITE_ID VARCHAR(650),
SUBSCRIPTION_ID VARCHAR(650),
ORDER_CREATED DATE,
ORDER_TYPE VARCHAR(650),
SUBSCRIPTION_STATUS VARCHAR(650),
PERIOD_NORMALIZER VARCHAR(650),
CHANGE_MRR_EVENT_TYPE VARCHAR(650),
TOTAL integer,
DAILY_MRR integer
);
INSERT INTO prev_test
VALUES('AB', '123', '2021-09-17', 'PRORATED', 'ACTIVE', '1M', 'New', 60, 2);
INSERT INTO prev_test
VALUES('AB', '123', '2021-09-20', 'PRORATED', 'ACTIVE', '1M', 'New', 30, 10);
create TABLE dim_date (
date_key date
);
INSERT INTO dim_date
VALUES('2021-09-17');
INSERT INTO dim_date
VALUES('2021-09-18');
INSERT INTO dim_date
VALUES('2021-09-19');
INSERT INTO dim_date
VALUES('2021-09-20');
当我这样选择我的数据时
SELECT
CASE WHEN SITE_ID IS NULL THEN FIRST_VALUE(SITE_ID) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE SITE_ID END AS SITE_ID ,
CASE WHEN SUBSCRIPTION_ID IS NULL THEN FIRST_VALUE(SUBSCRIPTION_ID) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE SUBSCRIPTION_ID END AS SUBSCRIPTION_ID ,
CASE WHEN ORDER_CREATED IS NULL THEN dd.DATE_KEY ELSE ORDER_CREATED END AS ORDER_CREATED,
CASE WHEN ORDER_TYPE IS NULL THEN FIRST_VALUE(ORDER_TYPE) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE ORDER_TYPE END AS ORDER_TYPE ,
CASE WHEN SUBSCRIPTION_STATUS IS NULL THEN FIRST_VALUE(SUBSCRIPTION_STATUS) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE SUBSCRIPTION_STATUS END AS SUBSCRIPTION_STATUS ,
CASE WHEN PERIOD_NORMALIZER IS NULL THEN FIRST_VALUE(PERIOD_NORMALIZER) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE PERIOD_NORMALIZER END AS PERIOD_NORMALIZER ,
CASE WHEN CHANGE_MRR_EVENT_TYPE IS NULL THEN FIRST_VALUE(CHANGE_MRR_EVENT_TYPE) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE CHANGE_MRR_EVENT_TYPE END AS CHANGE_MRR_EVENT_TYPE ,
CASE WHEN TOTAL IS NULL THEN FIRST_VALUE(TOTAL) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE TOTAL END AS TOTAL ,
CASE WHEN DAILY_MRR IS NULL THEN FIRST_VALUE(DAILY_MRR) OVER (ORDER BY ORDER_CREATED ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ELSE DAILY_MRR END AS DAILY_MRR
FROM prev_test
FULL OUTER JOIN DIM_DATE dd ON ORDER_CREATED = dd.DATE_KEY
ORDER BY 3 asc
但是,一旦我将相同的代码应用于我的生产数据(与我的示例相同的结构),它就根本不起作用。缺少的天数未填写,FIRST_VALUE 不会将数据“重复”到以下记录。没有错误。
这是我的“真实”数据的一个例子
SITE_ID |SUBSCRIPTION_ID |ORDER_CREATED|ORDER_TYPE |SUBSCRIPTION_STATUS|PERIOD_NORMALIZER|CHANGE_MRR_EVENT_TYPE |TOTAL |DAILY_MRR |
L22|JriInfs| 2021-06-02|PRORATED|Active |1M |Upgraded From | 0.0000| 0.000000|
L22|JriInfs| 2021-09-17|PRORATED|Active |1M |New |209.0000|209.000000|
L22|JriInfs| 2021-09-30|PRORATED|Active |1M |Changed |269.0000|269.000000|
L22|JriInfs| 2021-10-08|PRORATED|Active |1M |Downgraded From| 0.0000| 0.000000|
这是我想要的输出(** 是“缺失”的日子),我希望这会有所帮助:
SITE_ID |SUBSCRIPTION_ID |ORDER_CREATED|ORDER_TYPE |SUBSCRIPTION_STATUS|PERIOD_NORMALIZER|CHANGE_MRR_EVENT_TYPE |TOTAL |DAILY_MRR |
L22|JriInfs| 2021-06-02|PRORATED|Active |1M |Upgraded From | 0.0000| 0.000000|
**L22|JriInfs| 2021-06-03|PRORATED|Active |1M |Upgraded From | 0.0000| 0.000000|**
…
**L22|JriInfs| 2021-09-16|PRORATED|Active |1M |Upgraded From | 0.0000| 0.000000|**
L22|JriInfs| 2021-09-17|PRORATED|Active |1M |New |209.0000|209.000000|
**L22|JriInfs| 2021-09-18|PRORATED|Active |1M |New |209.0000|209.000000|**
…
**L22|JriInfs| 2021-09-29|PRORATED|Active |1M |New |209.0000|209.000000|**
L22|JriInfs| 2021-09-30|PRORATED|Active |1M |Changed |269.0000|269.000000|
...
我错过了什么?