问问题
132 次
1 回答
2
Snowflake 实现了 MATCH_RECOGNIZE,这是在纯 SQL 中查找复杂模式的最简单工具:
识别一组行中的模式匹配。MATCH_RECOGNIZE 接受一组行(来自表、视图、子查询或其他源)作为输入,并返回该组内给定行模式的所有匹配项。该模式的定义类似于正则表达式。
资料准备:
CREATE OR REPLACE TABLE t
AS
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:38:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:12:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 13:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 14:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 16:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 17:00:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-07-16 12:30:00'::timestamp AS created_at
)
SELECT * FROM t1;
查询场景一:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
-- MEASURES MATCH_NUMBER() AS m, --LAST/FIRST/CLASSIFIER/...
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
ORDER BY ID, CREATED_AT;
-- returns rows 1-4
这里的关键点是作为 Perl 风格的正则表达式提供的模式。在这里,我们正在寻找由一个或多个“missing_info”完成的一个或多个“create”的模式。
ALL ROWS PER MATCH
- 返回所有行,但如有必要可以更改为第一行
措施:指定附加输出列可用于提供附加信息,如 MATCH_NUMBER/MATCH_SEQUENCE_NUMBER/CLASSIFIER 等,具体取决于具体需求。
使用“|”可以在单个查询中提供更多模式 (选择):(c+m+|pm+|...)
编辑:
“谢谢你的回答!它返回前 4 行。我基本上需要第 1 和第 4 行。”
一旦确定了组,就可以过滤第一行和最后一行,例如使用QUALIFY
. 关键是使用我之前提到的 MEASURES:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
MEASURES MATCH_NUMBER() AS mn,
MATCH_SEQUENCE_NUMBER AS msn
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
,s AS status='succesful'
) mr
QUALIFY (ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn) = 1)
OR(ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn DESC)=1)
ORDER BY ID, CREATED_AT;
-- returns first and last row by group consisted of ID and MATCH_NUMBER
于 2021-07-21T18:56:02.660 回答