有一张包含访问数据的表格:
uid (INT) | created_at (DATETIME)
我想知道用户连续多少天访问了我们的应用程序。例如:
SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123
将返回:
d
------------
2012-04-28
2012-04-29
2012-04-30
2012-05-03
2012-05-04
有 5 条记录和两个间隔 - 3 天(4 月 28 日至 30 日)和 2 天(5 月 3 日至 4 日)。
我的问题是如何找到用户连续访问该应用程序的最大天数(示例中为 3 天)。试图在 SQL 文档中找到合适的函数,但没有成功。我错过了什么吗?
UPD: 谢谢你们的回答!实际上,我正在使用 vertica 分析数据库 (http://vertica.com/),但是这是一个非常罕见的解决方案,只有少数人有使用它的经验。虽然它支持 SQL-99 标准。
好吧,大多数解决方案都需要稍作修改。最后我创建了自己的查询版本:
-- returns starts of the vitit series
SELECT t1.d as s FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
s
---------------------
2012-04-28 01:00:00
2012-05-03 01:00:00
-- returns end of the vitit series
SELECT t1.d as f FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
f
---------------------
2012-04-30 01:00:00
2012-05-04 01:00:00
所以现在我们只需要以某种方式加入它们,例如通过行索引。
SELECT s, f, DATEDIFF(day, s, f) + 1 as seq FROM (
SELECT t1.d as s, ROW_NUMBER() OVER () as o1 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl1 LEFT JOIN (
SELECT t1.d as f, ROW_NUMBER() OVER () as o2 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl2 ON o1 = o2
样本输出:
s | f | seq
---------------------+---------------------+-----
2012-04-28 01:00:00 | 2012-04-30 01:00:00 | 3
2012-05-03 01:00:00 | 2012-05-04 01:00:00 | 2