1

我有一个包含日期、事件和用户的表格。有一个名为“A”的事件。我想找出特定事件在 SQL Bigquery 中事件“A”之前和之后发生了多少次。事件 A 可能会出现多次。但它应该只计算事件,直到它在前后条件下都遇到另一个事件 A。
例如,

 User           Date             Events
    123          2018-02-14            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A
    123          2018-02-13            A

输出将是这样的。

User       Event    Before   After
123          A      1        1
123          A      0        1
134          A      0        1

其他条件保持不变。

这个问题是我之前问题的延伸。

请参阅如何在 SQL Bigquery 中的另一个事件之前计算特定事件的数量?详情。

我必须计算的事件包含一个特定的前缀。意味着我必须检查以( XY 然后是某个事件名称)开头的事件。所以,XYSomeEvent 是我必须为其设置计数器的事件。有什么建议么?

4

2 回答 2

2

以下是 BigQuery 标准 SQL

#standardSQL
WITH grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
-- ORDER BY user  

您可以使用示例中的虚拟数据进行测试/播放,如下所示

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user,  '2018-02-14' dt, 'X.Y.A' event UNION ALL
  SELECT 123,       '2018-02-13', 'A'     UNION ALL
  SELECT 123,       '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123,       '2018-02-11', 'A'     UNION ALL
  SELECT 123,       '2018-02-01', 'X.Y.Z' UNION ALL
  SELECT 134,       '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134,       '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134,       '2018-02-04', 'A'     
), grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
ORDER BY user  

结果为

Row dt          user    event   before  after    
1   2018-02-11  123     A       1       1    
2   2018-02-13  123     A       1       1    
3   2018-02-04  134     A       0       1    
于 2018-02-16T02:21:25.370 回答
0

这是一个更普遍的问题。使用lag()和可以使用相同的想法lead()

select userid,
       (seqnum - lag(seqnum, 1, 0) over (partition by userid, order by date) - 1) as before,
       (lead(seqnum, 1, cnt) over (partition by user_id order by date) - seqnum - 1) as after
from (select t.*,
             row_number() over (partition by userid order by date) as seqnum,
             count(*) over (partition by userid) as cnt
      from t
      where event like 'X.Y%' or event = 'A'
     ) t
where event = 'A';
于 2018-02-15T15:35:45.137 回答