1

我已经阅读了几篇文章,给出了 MATCH_RECOGNIZE 可以做什么的例子。其中之一是创建一个阶梯漏斗。假设我们想要跟踪某些事件并在每个事件之后查看有多少用户离开。例如,到达主页,然后进入搜索页面,然后将东西放入购物车,最后付款。这些都是事件,我们对每一个都有记录。现在我想创建这样的漏斗:

  1. 到达主页 - 1000 个用户
  2. 进入搜索页面 - 980 个用户
  3. 向购物车添加东西 - 90 个用户
  4. 付费 - 10 个用户

这是步骤漏斗示例,在每个步骤之后我们的用户越来越少。现在,回到 match_recognize,我们可以使用这个函数告诉我们有多少用户匹配了这个模式(event1 + event2 + event3 + event4 +),但是我要解决的问题是,我们如何使用这个函数来知道有多少用户没有传递给下一个事件/阶段,而不仅仅是匹配整个模式序列的那些?

4

1 回答 1

0

因此,使用位和每个步骤的日期限制,您可以构建自己的漏斗逻辑,并决定是什么导致了削减:

WITH data(id, action_date, details) AS (
    SELECT * FROM VALUES
        (1,  '2022-03-02', 'home_page'),
        (1,  '2022-03-03', 'search'),
        (1,  '2022-03-04', 'add_cart'),
        (1,  '2022-03-05', 'pay'),

        (2,  '2022-03-02', 'home_page'),
        (2,  '2022-03-03', 'search'),
        (2,  '2022-03-04', 'add_cart'),

        (3,  '2022-03-02', 'home_page'),
        (3,  '2022-03-03', 'search'),

        (4,  '2022-03-02', 'home_page'),

        (5,  '2022-03-03', 'home_page'), -- missed the search step
        (5,  '2022-03-04', 'add_cart'),
    
        (6,  '2022-03-01', 'home_page'), -- gap between s1 & s2 too long
        (6,  '2022-03-05', 'search') 
), prep_a AS (
    SELECT id
        ,action_date
        ,details
        ,lag(details)over (partition by id order by action_date) as prior_detail
        ,datediff('seconds',lag(action_date,1,action_date)over(partition by id order by action_date),action_date)/86400 as prior_action_days_gap
        ,iff(details = 'home_page', action_date, null) as chain_start_date
        ,case
            when details = 'home_page' then 1
            when details = 'search' and prior_detail = 'home_page' and prior_action_days_gap <= 2 then 2
            when details = 'add_cart' and prior_detail = 'search' and prior_action_days_gap <= 1 then 4
            when details = 'pay' and prior_detail = 'add_cart' and prior_action_days_gap <= 1 then 8
            else 0
         end chain_bits
    FROM data
    --ORDER BY 1,2;
)
SELECT 
    chain_date
    ,count_if(funnel_home_page) as c_home_pages_step
    ,count_if(funnel_search) as c_search_step
    ,c_home_pages_step - c_search_step as c_search_step_drop
    ,count_if(funnel_cart) as c_cart_step
    ,c_search_step - c_cart_step as c_cart_step_drop
    ,count_if(funnel_pay) as c_pay_step
    ,c_cart_step_drop - c_pay_step as c_pay_step_drop
FROM (
    SELECT *
        ,sum(chain_bits)over(partition by id, chain_date order by action_date) as chain_state
        ,chain_bits=1 as funnel_home_page
        ,chain_bits=2 and chain_state = 3 as funnel_search
        ,chain_bits=4 and chain_state = 7 as funnel_cart
        ,chain_bits=8 and chain_state = 15 as funnel_pay
    FROM (
        SELECT *
            ,nvl(chain_start_date, lag(chain_start_date) ignore nulls over (partition by id order by action_date)) as chain_date
        FROM prep_a
    )
)
GROUP BY 1

给出:

CHAIN_DATE C_HOME_PAGES_STEP C_SEARCH_STEP C_SEARCH_STEP_DROP C_CART_STEP C_CART_STEP_DROP C_PAY_STEP C_PAY_STEP_DROP
2022-03-02 4 3 1 2 1 1 0
2022-03-03 1 0 1 0 0 0 0
2022-03-01 1 0 1 0 0 0 0
于 2022-02-10T21:00:34.677 回答