1

我试图弄清楚每种“事件类型”发生的天数。例如,我有几个用户、几个事件类型和日期。我想为“自上次事件以来的天数”添加一列(见图)。这个(特别是雪花)的 SQL 语法是什么?该图像准确地显示了我正在尝试做的事情,但我在 Excel 中创建了该示例。

输出目标

4

2 回答 2

0

在评论“如果事件类型没有改变,取计数+1,如果事件类型改变,从1开始计数”,这是重要的陈述,可以翻译为事件类型的窗口函数,然后使用row_number功能。

有一个有趣的窗口函数,称为conditional_change_event https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event.html ,它有助于当前行与前一行的变化。

请注意,有一个order by子句conditional_change_event让我们引入一个autoincrementid 列。也就是说,让我们开始使用一些 sql 查询

使用示例值创建表:

create or replace temporary table _temp (
 id int autoincrement,
 user int,
 _type varchar,
 dates date
);

insert into _temp(user, _type, dates)
values (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'));

目标

with event_changes as (
  select *,
  conditional_change_event(_type) over (order by id) as type_changes
  from _temp
)
select *, 
row_number() over(partition by _type, type_changes order by id) as days_since_last_event
from event_changes
order by id;

输出

  ID    USER    _TYPE   DATES   TYPE_CHANGES    DAYS_SINCE_LAST_EVENT
   1    12345   active  2021-01-15  0   1
   2    12345   active  2021-01-16  0   2
   3    12345   active  2021-01-17  0   3
   4    12345   dormant 2021-01-18  1   1
   5    12345   dormant 2021-01-19  1   2
   6    12345   churned 2021-01-20  2   1
   7    12345   churned 2021-01-21  2   2
   8    12345   churned 2021-01-22  2   3
   9    39498   active  2021-01-15  3   1
  10    39498   active  2021-01-16  3   2
  11    39498   dormant 2021-01-17  4   1
  12    39498   churned 2021-01-18  5   1
于 2022-01-16T09:33:52.220 回答
0

使用NVLFIRST_VALUELAGDATEDIFFDATEADDIFF

冗长的步骤如下所示:

SELECT 
   user,
   type,
   dates,
   lead(type)over(partition by user order by dates) as lead_event,
   iff(lead_event != type, dates, null) as diff_event_date,
   first_value(dates)over(partition by user order by dates) as first_user_date   
   lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
   NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
   ​datediff(day, date_of_true_interest, dates) as "days since last event"
FROM data
ORDER BY 1,3

但这行不通。因此,如果我们需要将第二个滞后改组为第二个逻辑阶段。因此:

SELECT 
   user,
   type,
   dates,
   lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
   NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest,
   datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
    SELECT 
       user,
       type,
       dates,
       lead(type)over(partition by user order by dates) as lead_event,
       iff(lead_event != type, dates, null) as diff_event_date,
       first_value(dates)over(partition by user order by dates) as first_user_date   
    FROM data
)
ORDER BY 1,3;

这很丑陋,所以要整理另一层。

SELECT 
   user,
   type,
   dates,
   datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
    SELECT 
       user,
       type,
       dates,
       lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
       NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
    FROM (
        SELECT 
           user,
           type,
           dates,
           lead(type)over(partition by user order by dates) as lead_event,
           iff(lead_event != type, dates, null) as diff_event_date,
           first_value(dates)over(partition by user order by dates) as first_user_date
        FROM data
    ) 
)
ORDER BY 1,3;

并与一些混合data

WITH data AS (
    select * from values 
        (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
        (12345,'active',to_date('1/16/21', 'MM/DD/YY')),
        (12345,'active',to_date('1/17/21', 'MM/DD/YY')),
        (12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
        (12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
        (39498,'active',to_date('1/15/21', 'MM/DD/YY')),
        (39498,'active',to_date('1/16/21', 'MM/DD/YY')),
        (39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
        (39498,'churned',to_date('1/18/21', 'MM/DD/YY'))
    v( user, type, dates)
)

我们得到结果:

用户 类型 日期 自上次活动以来的天数
12345 积极的 2021-01-15 1
12345 积极的 2021-01-16 2
12345 积极的 2021-01-17 3
12345 休眠 2021-01-18 1
12345 休眠 2021-01-19 2
12345 搅动 2021-01-20 1
12345 搅动 2021-01-21 2
12345 搅动 2021-01-22 3
39498 积极的 2021-01-15 1
39498 积极的 2021-01-16 2
39498 休眠 2021-01-17 1
39498 搅动 2021-01-18 1
于 2022-01-10T22:20:47.490 回答