1

我的场景要求我将相隔不到 60 秒的会话视为同一会话。

数据如下。

Min_Timestamp                Max_Timestamp                Device_ID  Session_ID  Prev_Max_Timestamp           Diff_Sec
2019-12-03 23:05:30.416 UTC  2019-12-03 23:09:13.502 UTC  AAAAA      I90HYTRFJI  null                         null
2019-12-03 23:09:21.517 UTC  2019-12-03 23:09:53.353 UTC  AAAAA      98UHIGSNJR  2019-12-03 23:09:13.502 UTC  8
2019-12-03 00:00:28.933 UTC  2019-12-03 00:09:03.473 UTC  BBBBB      32QE8Y76TG  null                         null
2019-12-03 00:09:19.106 UTC  2019-12-03 00:23:26.554 UTC  BBBBB      R4GUY432AD  2019-12-03 00:09:03.473 UTC  16
2019-12-03 00:23:26.818 UTC  2019-12-03 00:23:26.837 UTC  BBBBB      E32GUYE328  2019-12-03 00:23:26.554 UTC  0
2019-12-03 17:00:32.160 UTC  2019-12-03 17:03:48.758 UTC  BBBBB      GY1EW32876  2019-12-03 00:23:26.837 UTC  59825
2019-12-03 17:03:58.069 UTC  2019-12-03 17:17:12.408 UTC  BBBBB      2876T128Y7  2019-12-03 17:03:48.758 UTC  9
2019-12-03 17:18:24.528 UTC  2019-12-03 17:18:27.516 UTC  BBBBB      098U6598U5  2019-12-03 17:17:12.408 UTC  73
2019-12-03 16:30:29.970 UTC  2019-12-03 18:44:18.972 UTC  CCCCC      UWI4UII2J4  null                         null
2019-12-04 17:32:19.285 UTC  2019-12-04 17:32:24.668 UTC  CCCCC      G3247ROIUH  2019-12-03 18:44:18.972 UTC  82080

将相隔不到 60 秒但仍按设备分开的会话分组在一起。它看起来像这样。

Min_Timestamp                Max_Timestamp                Device_ID  Session_ID  Prev_Max_Timestamp           Diff_Sec
2019-12-03 23:05:30.416 UTC  2019-12-03 23:09:13.502 UTC  AAAAA      I90HYTRFJI  null                         null
2019-12-03 23:09:21.517 UTC  2019-12-03 23:09:53.353 UTC  AAAAA      98UHIGSNJR  2019-12-03 23:09:13.502 UTC  8

2019-12-03 00:00:28.933 UTC  2019-12-03 00:09:03.473 UTC  BBBBB      32QE8Y76TG  null                         null
2019-12-03 00:09:19.106 UTC  2019-12-03 00:23:26.554 UTC  BBBBB      R4GUY432AD  2019-12-03 00:09:03.473 UTC  16
2019-12-03 00:23:26.818 UTC  2019-12-03 00:23:26.837 UTC  BBBBB      E32GUYE328  2019-12-03 00:23:26.554 UTC  0

2019-12-03 17:00:32.160 UTC  2019-12-03 17:03:48.758 UTC  BBBBB      GY1EW32876  2019-12-03 00:23:26.837 UTC  59825
2019-12-03 17:03:58.069 UTC  2019-12-03 17:17:12.408 UTC  BBBBB      2876T128Y7  2019-12-03 17:03:48.758 UTC  9
2019-12-03 17:18:24.528 UTC  2019-12-03 17:18:27.516 UTC  BBBBB      098U6598U5  2019-12-03 17:17:12.408 UTC  73

2019-12-03 16:30:29.970 UTC  2019-12-03 18:44:18.972 UTC  CCCCC      UWI4UII2J4  null                         null

2019-12-04 17:32:19.285 UTC  2019-12-04 17:32:24.668 UTC  CCCCC      G3247ROIUH  2019-12-03 18:44:18.972 UTC  82080

我希望能够得到像这样的东西。Session_ID不需要像 A1、B1、C1 等。可以简单地是会话的第一个值。注意Max_Timestamp最新的现在是新的Max_Timestamp

Min_Timestamp                Max_Timestamp                Device_ID  Session_ID
2019-12-03 23:05:30.416 UTC  2019-12-03 23:09:53.353 UTC  AAAAA      A1          
2019-12-03 00:00:28.933 UTC  2019-12-03 00:23:26.837 UTC  BBBBB      B1
2019-12-03 17:00:32.160 UTC  2019-12-03 17:18:27.516 UTC  BBBBB      B2
2019-12-03 16:30:29.970 UTC  2019-12-03 18:44:18.972 UTC  CCCCC      C1
2019-12-04 17:32:19.285 UTC  2019-12-04 17:32:24.668 UTC  CCCCC      C2

我的想法是让所有Session_ID属于同一组的人都一样。然后 group byDevice_IDSession_IDto get min(Min_Timestamp)max(Max_Timestamp). 我试图摆弄first_value()on Session_ID,但我不知道如何正确分区。

最好在遗产中实现这一点。如果没有,标准也会起作用。

4

1 回答 1

1

以下是 BigQuery 标准 SQL(如果您愿意 - 只需将其“翻译”为旧版 - 但建议还是迁移到标准版!!!现在就这样做并在下面使用)

#standardSQL
SELECT MIN(Min_Timestamp) AS Min_Timestamp, MAX(Max_Timestamp) AS Max_Timestamp, Device_ID, Session_ID
FROM (
  SELECT * EXCEPT(flag, Session_ID), 
    CONCAT(Device_ID, CAST(COUNTIF(flag) OVER(PARTITION BY Device_ID ORDER BY Max_Timestamp) AS STRING)) AS Session_ID
  FROM (
    SELECT *, 
      IFNULL(TIMESTAMP_DIFF(Min_Timestamp, LAG(Max_Timestamp) OVER(PARTITION BY Device_ID ORDER BY Max_Timestamp), SECOND), 999) > 60 flag
    FROM `project.dataset.table`
  )
)
GROUP BY Device_ID, Session_ID

您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2019-12-03 23:05:30.416 UTC' Min_Timestamp, TIMESTAMP '2019-12-03 23:09:13.502 UTC' Max_Timestamp, 'AAAAA' Device_ID, 'I90HYTRFJI' Session_ID UNION ALL
  SELECT '2019-12-03 23:09:21.517 UTC', '2019-12-03 23:09:53.353 UTC', 'AAAAA', '98UHIGSNJR' UNION ALL
  SELECT '2019-12-03 00:00:28.933 UTC', '2019-12-03 00:09:03.473 UTC', 'BBBBB', '32QE8Y76TG' UNION ALL
  SELECT '2019-12-03 00:09:19.106 UTC', '2019-12-03 00:23:26.554 UTC', 'BBBBB', 'R4GUY432AD' UNION ALL
  SELECT '2019-12-03 00:23:26.818 UTC', '2019-12-03 00:23:26.837 UTC', 'BBBBB', 'E32GUYE328' UNION ALL
  SELECT '2019-12-03 17:00:32.160 UTC', '2019-12-03 17:03:48.758 UTC', 'BBBBB', 'GY1EW32876' UNION ALL
  SELECT '2019-12-03 17:03:58.069 UTC', '2019-12-03 17:17:12.408 UTC', 'BBBBB', '2876T128Y7' UNION ALL
  SELECT '2019-12-03 17:18:24.528 UTC', '2019-12-03 17:18:27.516 UTC', 'BBBBB', '098U6598U5' UNION ALL
  SELECT '2019-12-03 16:30:29.970 UTC', '2019-12-03 18:44:18.972 UTC', 'CCCCC', 'UWI4UII2J4' UNION ALL
  SELECT '2019-12-04 17:32:19.285 UTC', '2019-12-04 17:32:24.668 UTC', 'CCCCC', 'G3247ROIUH' 
)
SELECT MIN(Min_Timestamp) AS Min_Timestamp, MAX(Max_Timestamp) AS Max_Timestamp, Device_ID, Session_ID
FROM (
  SELECT * EXCEPT(flag, Session_ID), 
    CONCAT(Device_ID, CAST(COUNTIF(flag) OVER(PARTITION BY Device_ID ORDER BY Max_Timestamp) AS STRING)) AS Session_ID
  FROM (
    SELECT *, 
      IFNULL(TIMESTAMP_DIFF(Min_Timestamp, LAG(Max_Timestamp) OVER(PARTITION BY Device_ID ORDER BY Max_Timestamp), SECOND), 999) > 60 flag
    FROM `project.dataset.table`
  )
)
GROUP BY Device_ID, Session_ID
-- ORDER BY Device_ID, Session_ID  

带输出

Row Min_Timestamp               Max_Timestamp               Device_ID   Session_ID   
1   2019-12-03 23:05:30.416 UTC 2019-12-03 23:09:53.353 UTC AAAAA       AAAAA1   
2   2019-12-03 00:00:28.933 UTC 2019-12-03 00:23:26.837 UTC BBBBB       BBBBB1   
3   2019-12-03 17:00:32.160 UTC 2019-12-03 17:17:12.408 UTC BBBBB       BBBBB2   
4   2019-12-03 17:18:24.528 UTC 2019-12-03 17:18:27.516 UTC BBBBB       BBBBB3   
5   2019-12-03 16:30:29.970 UTC 2019-12-03 18:44:18.972 UTC CCCCC       CCCCC1   
6   2019-12-04 17:32:19.285 UTC 2019-12-04 17:32:24.668 UTC CCCCC       CCCCC2     
于 2019-12-05T03:05:31.213 回答