假设我的表结构是这样的
我计划按(USER 和 SEQUENCE)对它进行分组,并获取下一个序列的 LEAD 时间戳。这是我正在寻找的输出
如果可能的话,我可以在不使用 LEAD 函数的情况下解决这个问题吗?
假设我的表结构是这样的
我计划按(USER 和 SEQUENCE)对它进行分组,并获取下一个序列的 LEAD 时间戳。这是我正在寻找的输出
如果可能的话,我可以在不使用 LEAD 函数的情况下解决这个问题吗?
以下是 BigQuery 标准 SQL
我将展示两个选项 - 使用 JOIN(只是为了证明我正确理解/反向工程预期逻辑),然后是无 JOIN 版本(注意我使用ts
的是字段名称而不是timestamp
)
使用 JOIN
#standardSQL
SELECT a.user, a.sequence, MIN(b.ts) ts
FROM (
SELECT user, sequence, MAX(ts) AS max_ts
FROM `project.dataset.table`
GROUP BY user, sequence
) a
LEFT JOIN `project.dataset.table` b
ON a.user = b.user AND b.sequence = a.sequence + 1
WHERE a.max_ts <= IFNULL(b.ts, a.max_ts)
GROUP BY user, sequence
-- ORDER BY user, sequence
无JOIN版本
#standardSQL
SELECT
user, sequence,
(
SELECT ts FROM UNNEST(arr_ts) ts
WHERE max_ts < ts ORDER BY ts LIMIT 1
) ts
FROM (
SELECT
user, sequence, max_ts,
LEAD(arr_ts) OVER (PARTITION BY user ORDER BY sequence) arr_ts
FROM (
SELECT
user, sequence, MAX(ts) max_ts,
ARRAY_AGG(ts ORDER BY ts) arr_ts
FROM `project.dataset.table`
GROUP BY user, sequence
)
)
-- ORDER BY user, sequence
以上两个版本都可以使用以下虚拟数据进行测试/播放
WITH `project.dataset.table` AS (
SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
SELECT 'user1', 2, 'T2' UNION ALL
SELECT 'user1', 1, 'T3' UNION ALL
SELECT 'user1', 1, 'T4' UNION ALL
SELECT 'user1', 3, 'T5' UNION ALL
SELECT 'user1', 2, 'T6' UNION ALL
SELECT 'user1', 3, 'T7' UNION ALL
SELECT 'user1', 3, 'T8'
)
并且都返回低于结果
user sequence ts
user1 1 T6
user1 2 T7
user1 3 null
不确定 bigquery,但在一般 SQL 中,它会写成:
select user, sequence, LEAD (max_timestamp,1) OVER (PARTITION BY user ORDER BY sequence) as timestamp
from (
select user, sequence, max(timestamp) as max_timestamp
from table
group by user, sequence) q1;
请注意保留字,例如表、用户、时间戳等。
编辑:是的,忘记这个答案,对所需的输出不够关注。米哈伊尔做对了!