我试图弄清楚每种“事件类型”发生的天数。例如,我有几个用户、几个事件类型和日期。我想为“自上次事件以来的天数”添加一列(见图)。这个(特别是雪花)的 SQL 语法是什么?该图像准确地显示了我正在尝试做的事情,但我在 Excel 中创建了该示例。
问问题
41 次
2 回答
0
在评论“如果事件类型没有改变,取计数+1,如果事件类型改变,从1开始计数”,这是重要的陈述,可以翻译为事件类型的窗口函数,然后使用row_number
功能。
有一个有趣的窗口函数,称为conditional_change_event
https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event.html
,它有助于当前行与前一行的变化。
请注意,有一个order by
子句conditional_change_event
让我们引入一个autoincrement
id 列。也就是说,让我们开始使用一些 sql 查询
使用示例值创建表:
create or replace temporary table _temp (
id int autoincrement,
user int,
_type varchar,
dates date
);
insert into _temp(user, _type, dates)
values (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'));
目标
with event_changes as (
select *,
conditional_change_event(_type) over (order by id) as type_changes
from _temp
)
select *,
row_number() over(partition by _type, type_changes order by id) as days_since_last_event
from event_changes
order by id;
输出
ID USER _TYPE DATES TYPE_CHANGES DAYS_SINCE_LAST_EVENT
1 12345 active 2021-01-15 0 1
2 12345 active 2021-01-16 0 2
3 12345 active 2021-01-17 0 3
4 12345 dormant 2021-01-18 1 1
5 12345 dormant 2021-01-19 1 2
6 12345 churned 2021-01-20 2 1
7 12345 churned 2021-01-21 2 2
8 12345 churned 2021-01-22 2 3
9 39498 active 2021-01-15 3 1
10 39498 active 2021-01-16 3 2
11 39498 dormant 2021-01-17 4 1
12 39498 churned 2021-01-18 5 1
于 2022-01-16T09:33:52.220 回答
0
使用NVL、FIRST_VALUE、LAG、DATEDIFF、DATEADD、IFF
冗长的步骤如下所示:
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM data
ORDER BY 1,3
但这行不通。因此,如果我们需要将第二个滞后改组为第二个逻辑阶段。因此:
SELECT
user,
type,
dates,
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest,
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
FROM data
)
ORDER BY 1,3;
这很丑陋,所以要整理另一层。
SELECT
user,
type,
dates,
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
SELECT
user,
type,
dates,
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
FROM (
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
FROM data
)
)
ORDER BY 1,3;
并与一些混合data
WITH data AS (
select * from values
(12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'))
v( user, type, dates)
)
我们得到结果:
用户 | 类型 | 日期 | 自上次活动以来的天数 |
---|---|---|---|
12345 | 积极的 | 2021-01-15 | 1 |
12345 | 积极的 | 2021-01-16 | 2 |
12345 | 积极的 | 2021-01-17 | 3 |
12345 | 休眠 | 2021-01-18 | 1 |
12345 | 休眠 | 2021-01-19 | 2 |
12345 | 搅动 | 2021-01-20 | 1 |
12345 | 搅动 | 2021-01-21 | 2 |
12345 | 搅动 | 2021-01-22 | 3 |
39498 | 积极的 | 2021-01-15 | 1 |
39498 | 积极的 | 2021-01-16 | 2 |
39498 | 休眠 | 2021-01-17 | 1 |
39498 | 搅动 | 2021-01-18 | 1 |
于 2022-01-10T22:20:47.490 回答