0

我正在构建一个查询,以通过事件通过平台跟踪用户的生命周期。该表有EVENTS3 列USER_ID和。下面是表格的快照,DATE_TIMEEVENT_NAME

在此处输入图像描述

以下是我的查询,

SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
    MEASURES MIN(IFF(EVENT_NAME = 'registration new', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'registration pending confirm', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your business information', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'your personal information', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(EVENT_NAME = 'qualified', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
  ONE ROW PER MATCH
  PATTERN(STEP_1 ANYTHING* STEP_5)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS EVENT_NAME = 'registration pending confirm',
        STEP_3 AS EVENT_NAME = 'your business information',
        STEP_4 AS EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
)

我的预期结果,

在此处输入图像描述 在此处输入图像描述 在此处输入图像描述

我现在得到的,

在此处输入图像描述 在此处输入图像描述 在此处输入图像描述

以下是我的要求/注意事项,

  • 下一个事件的时间戳应该大于或等于前一个事件的时间戳(以先到者为准,以便时间戳相等或通过漏斗的事件不断增加)。这种逻辑的一个很好的例子可以用当前结果和预期结果的差异来解释,即REGISTRATION_PENDING_CONFIRM_TIMESTAMPQUALIFIED_TIMESTAMP列中的值。
  • 并非所有用户都拥有所有这 5 个事件,例如,如果USER_ID54321 没有/跳过事件“您的个人信息”,则结果必须包含其余步骤的数据(现在如果用户没有/跳过漏斗中的任何事件都不会由查询返回数据)。我觉得这是因为当用户流中缺少定义为度量的事件时,模式搜索会失败。

表中的事件顺序不一致,因此我根据业务/渠道逻辑在度量部分中按顺序定义了事件

4

1 回答 1

0

这不是一个完整的答案,但至少我在这里帮助定义示例数据(不仅仅是屏幕截图),并介绍以下用法CLASSIFIER

create or replace temp table events as
select $1 user_id, $2 date_time, $3 event_name
from values(1,'2020-11-26 15:24:00','registration new')
, (1,'2021-04-12 18:00:00','registration new')
, (1,'2020-11-26 15:24:00','registration pending confirm')
, (1,'2021-04-12 18:11:00','registration pending confirm')
, (1,'2021-04-18 15:04:00','your personal information')
, (1,'2021-04-22 13:13:00','your personal information')
, (1,'2021-04-13 10:22:00','qualified')
, (1,'2021-04-22 13:13:00','qualified')
;


SELECT * FROM EVENTS
MATCH_RECOGNIZE
(   PARTITION BY USER_ID
    ORDER BY DATE_TIME
 
    MEASURES  classifier as class, MIN(IFF(CLASSIFIER = 'STEP_1', DATE_TIME, NULL)) AS REGISTRATION_NEW_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_2', DATE_TIME, NULL)) AS REGISTRATION_PENDING_CONFIRM_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_3', DATE_TIME, NULL)) AS YOUR_BUSINESS_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_4', DATE_TIME, NULL)) AS YOUR_PERSONAL_INFORMATION_TIMESTAMP,
             MIN(IFF(CLASSIFIER = 'STEP_5', DATE_TIME, NULL)) AS QUALIFIED_TIMESTAMP
 
  ONE ROW PER MATCH
 -- all rows per match
  PATTERN((step_1 | step_2 | step_3 | step_4 | step_5 | coincidence)*)--(STEP_2 | XX)* (STEP_3 | XXX)* (STEP_4 | XX)* (STEP_5 | XX)*)
  DEFINE
        STEP_1 AS EVENT_NAME = 'registration new',
        STEP_2 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'registration pending confirm' ,
        STEP_3 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your business information',
        STEP_4 AS LAG(DATE_TIME) < DATE_TIME AND EVENT_NAME = 'your personal information',
        STEP_5 AS EVENT_NAME = 'qualified'
        , COINCIDENCE AS LAG(DATE_TIME) = DATE_TIME
);
于 2021-05-25T06:08:58.293 回答