-1
时间戳 ID 范围
2021-01-23 12:52:34.159999 UTC 1 enter_page
2021-01-23 12:53:02.342 UTC 1 view_product
2021-01-23 12:53:02.675 UTC 1 查看
2021-01-23 12:53:04.342 UTC 1 搜索页面
2021-01-23 12:53:24.513 UTC 1 查看

我正在尝试使用 WINDOWS/ANALYTICAL 函数获取“范围”列中 FIRST_VALUE 和 LAST VALUE 之间的所有值

我已经得到了 first_value() = enter_page
和 last_value() == checkout

通过在 SQLite 中使用 windows 函数

FIRST_VALUE(scope) OVER ( PARTITION BY id ORDER BY julianday(timestamp) ASC) first_page
FIRST_VALUE(scope) OVER ( PARTITION BY id ORDER BY julianday(timestamp) DESC ) last_page

我正在尝试捕获 [不包括边缘] 之间的所有步骤:view_product, apartment_view, checkout[, N-field]稍后将它们添加到字符串中(唯一值 -STR_AGGR() )

完成此操作后,我稍后将尝试查找客户是否在 purchase_journey 期间的某个时间点多次打开结帐

我的结果应该喜欢

ID 第一页 最后一页 inbetween_pages
1 enter_page 查看 查看产品、结帐、搜索页面

ps 我试图避免使用 python 来处理这个。我想要一种使用纯 SQL 的“干净”方式

非常感谢各位

4

2 回答 2

1

您可以使用GROUP_CONCAT()支持该ORDER BY子句的窗口函数来执行此操作,因此您将以正确的顺序输入scopes ,而不是不支持该子句的聚合函数,并且不能保证它返回的结果按特定顺序排列:inbetween_pagesGROUP_CONCAT()ORDER BY

SELECT DISTINCT id, first_page, last_page,
       GROUP_CONCAT(CASE WHEN timestamp NOT IN (min_timestamp, max_timestamp) THEN scope END) 
       OVER (PARTITION BY id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) inbetween_pages
FROM (
  SELECT *,
         FIRST_VALUE(scope) OVER (PARTITION BY id ORDER BY timestamp) first_page,
         FIRST_VALUE(scope) OVER (PARTITION BY id ORDER BY timestamp DESC) last_page,
         MIN(timestamp) OVER (PARTITION BY id) min_timestamp,
         MAX(timestamp) OVER (PARTITION BY id) max_timestamp
  FROM tablename       
)

请参阅演示
结果:

ID 第一页 最后一页 inbetween_pages
1 enter_page 查看 view_product,结帐,search_page
于 2021-01-28T18:29:46.267 回答
1

嗯。. . 我在想:

select id, group_concat(scope, ',')
from (select t.*,
             row_number() over (partition by id order by timestamp) as seqnum_asc,
             row_number() over (partition by id order by timestamp desc) as seqnum_desc
      from t
      order by id, timestamp
     ) t
where 1 not in (seqnum_asc, seqnum_desc)
group by id;

在 SQLite 中,group_concat()不接受order by参数。我的理解是它尊重子查询的顺序,这就是为什么子查询有一个order by.

于 2021-01-28T18:04:31.563 回答