9

希望对用户群进行一些同期群分析。我们有 2 个表“用户”和“会话”,其中用户和会话都有一个“created_at”字段。我正在寻找生成一个 7 x 7 数字表(带有一些空白)的查询,该表向我显示:在特定日期创建的同时创建会话的用户计数 y = (0..6几天前),表示他那天回来了。

created_at  d2  d3  d4
today       *   *   *
today-1     49  *   *
today-2     45  30  *
today-3     47  48  18
...

在这种情况下,今天 3 创建的 47 个用户在今天 2 返回。

我可以在单个 MySQL 查询中执行此操作吗?我可以像这样单独执行查询,但是在一个查询中完成所有查询会非常好。

SELECT `users`.* FROM `users` INNER JOIN `sessions` ON `sessions`.`user_id` = `users`.`id` WHERE `users`.`os` = 'ios' AND (`sessions`.`updated_at` BETWEEN '2013-01-16 08:00:00' AND '2013-01-17 08:00:00')
4

4 回答 4

23

这似乎是一个复杂的问题。不管它在你看来是否也是一个困难的问题,从一个较小的问题开始解决它绝不是一个坏主意。

例如,您可以从返回上周注册的所有用户(仅用户)的查询开始,即根据您的要求从现在六天后的那一天开始:

SELECT *
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY

下一步可能是按日期对结果进行分组并计算每组中的行数:

SELECT
  created_at,
  COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY created_at

如果created_atdatetimeor timestamp,则DATE(created_at)用作分组标准:

SELECT
  DATE(created_at) AS created_at,
  COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY DATE(created_at)

但是,您似乎不需要输出中的绝对日期,而只需要相对日期,例如todaytoday - 1 day。在这种情况下,您可以使用DATEDIFF()返回两个日期之间的天数的函数来生成(数字)偏移量从今天开始并按这些值分组:

SELECT
  DATEDIFF(CURDATE(), created_at) AS created_at,
  COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY DATE(created_at)

您的created_at列将包含“日期”,如01依此类推,直到6。将它们转换为todaytoday-1是微不足道的,您将在最终查询中看到这一点。然而,到目前为止,我们已经到了需要后退一步(或者,也许是向右走半步)的地步,因为我们真的不需要计算用户,而是计算他们的回报. 因此,目前需要的实际工作数据集users将是:

SELECT
  id,
  DATEDIFF(CURDATE(), created_at) AS day_offset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY

我们需要用户 ID 来将此行集加入(将要从中派生的行集)sessions,并且我们需要day_offset作为分组标准。

接下来,需要在sessions桌子上执行类似的转换,我不会对此进行详细说明。可以说生成的查询与上一个查询非常相同,只有两个例外:

  • id替换为user_id;

  • DISTINCT 应​​用于整个子集。

DISTINCT 的原因是每个用户和每天返回不超过一行:据我了解,无论用户在特定日期可能有多少会话,您都希望将它们计为一次 return。因此,这是从以下内容得出的sessions

SELECT DISTINCT
  user_id,
  DATEDIFF(CURDATE(), created_at) AS day_offset
FROM sessions
WHERE created_at >= CURDATE() - INTERVAL 6 DAY

现在只剩下加入两个派生表,应用分组并使用条件聚合来获得所需的结果:

SELECT
  CONCAT('today', IFNULL(CONCAT('-', NULLIF(u.DayOffset, 0)), '')) AS created_at,
  SUM(s.DayOffset = 0) AS d0,
  SUM(s.DayOffset = 1) AS d1,
  SUM(s.DayOffset = 2) AS d2,
  SUM(s.DayOffset = 3) AS d3,
  SUM(s.DayOffset = 4) AS d4,
  SUM(s.DayOffset = 5) AS d5,
  SUM(s.DayOffset = 6) AS d6
FROM (
  SELECT
    id,
    DATEDIFF(CURDATE(), created_at) AS DayOffset
  FROM users
  WHERE created_at >= CURDATE() - INTERVAL 6 DAY
) u
LEFT JOIN (
  SELECT DISTINCT
    user_id,
    DATEDIFF(CURDATE(), created_at) AS DayOffset
  FROM sessions
  WHERE created_at >= CURDATE() - INTERVAL 6 DAY
) s
ON u.id = s.user_id
GROUP BY u.DayOffset
;

我必须承认我没有对此进行测试/调试,但是,如果需要,我很乐意使用您提供的数据样本,一旦您提供它们。:)

于 2013-01-22T18:17:25.947 回答
3

每月明智的队列示例:

首先让我们创建表个人用户活动流(MONTH WISE):

SELECT 
    mu.created_timestamp AS cohort
    , mu.id AS user_id
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 1 AND l.user_id = mu.id) AS m1
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 2 AND l.user_id = mu.id) AS m2
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 3 AND l.user_id = mu.id) AS m3
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 4 AND l.user_id = mu.id) AS m4
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 5 AND l.user_id = mu.id) AS m5
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 6 AND l.user_id = mu.id) AS m6
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 7 AND l.user_id = mu.id) AS m7
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 8 AND l.user_id = mu.id) AS m8
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 9 AND l.user_id = mu.id) AS m9
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 10 AND l.user_id = mu.id) AS m10
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 11 AND l.user_id = mu.id) AS m11
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 12 AND l.user_id = mu.id) AS m12
FROM user mu 
WHERE mu.created_timestamp BETWEEN '2018-01-01 00:00:00' AND '2019-12-31 23:59:59'

然后在此表之后计算用户的个人活动总和:

SELECT MONTH(c.cohort) AS cohort
       ,COUNT(c.user_id) AS signups
       ,SUM(c.m1) AS m1 
       ,SUM(c.m2) AS m2 
       ,SUM(c.m3) AS m3 
       ,SUM(c.m4) AS m4 
       ,SUM(c.m5) AS m5 
       ,SUM(c.m6) AS m6 
       ,SUM(c.m7) AS m7 
       ,SUM(c.m8) AS m8 
       ,SUM(c.m9) AS m9 
       ,SUM(c.m10) AS m10 
       ,SUM(c.m11) AS m11 
       ,SUM(c.m12) AS m12 
FROM (SELECT 
    mu.created_timestamp AS cohort
    , mu.id AS user_id
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 1 AND l.user_id = mu.id) AS m1
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 2 AND l.user_id = mu.id) AS m2
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 3 AND l.user_id = mu.id) AS m3
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 4 AND l.user_id = mu.id) AS m4
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 5 AND l.user_id = mu.id) AS m5
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 6 AND l.user_id = mu.id) AS m6
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 7 AND l.user_id = mu.id) AS m7
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 8 AND l.user_id = mu.id) AS m8
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 9 AND l.user_id = mu.id) AS m9
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 10 AND l.user_id = mu.id) AS m10
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 11 AND l.user_id = mu.id) AS m11
    ,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 12 AND l.user_id = mu.id) AS m12
FROM user mu 
WHERE mu.created_timestamp BETWEEN '2018-01-01 00:00:00' AND '2019-12-31 23:59:59') AS c GROUP BY MONTH(cohort)

您可以使用天来代替月份,其他明智的队列分析主要用于月份情况

于 2019-05-14T05:40:56.707 回答
2

这个答案反转了@Newy 想要的输出表,因此群组是行而不是列,并使用绝对日期而不是相对日期。

我正在寻找一个可以给我这样的查询:

Date        d0  d1  d2  d3  d4  d5  d6
2016-11-03  3   1   0   0   0   0   0
2016-11-04  4   2   0   1   0   0   *
2016-11-05  7   0   1   1   0   *   *
2016-11-06  7   3   1   1   *   *   *
2016-11-07  13  5   1   *   *   *   *
2016-11-08  4   0   *   *   *   *   *
2016-11-09  1   *   *   *   *   *   *

我正在寻找在某个日期注册的用户数量,然后有多少用户在 1 天后、2 天后返回等等。所以在 2016-11-07 13 用户注册并进行了会话,然后 5这些用户中有 1 天后回来了,然后有一个用户在 2 天后回来了,依此类推。

我采用了@Andriy M 的大型查询的第一个子查询并对其进行了修改,以提供用户注册的日期,而不是相对于当前日期的日期:

SELECT
    id,
    DATE(created_at) AS DayOffset
  FROM users
  WHERE created_at >= CURDATE() - INTERVAL 6 DAY

然后我修改的 LEFT JOIN 子查询如下所示:

 SELECT DISTINCT
    sessions.user_id,
    DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
    FROM sessions
    LEFT JOIN users ON (users.id = sessions.user_id)
    WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY

我希望 dayoffset 不是相对于@Andriy M 的回答中的当前日期,而是相对于用户注册的日期。因此,我确实在用户表上留下了联接,以获取用户注册的时间并对其进行了日期差异。

所以最终的查询看起来像这样:

SELECT u.DayOffset as Date,
  SUM(s.DayOffset = 0) AS d0,
  SUM(s.DayOffset = 1) AS d1,
  SUM(s.DayOffset = 2) AS d2,
  SUM(s.DayOffset = 3) AS d3,
  SUM(s.DayOffset = 4) AS d4,
  SUM(s.DayOffset = 5) AS d5,
  SUM(s.DayOffset = 6) AS d6
FROM (
 SELECT
    id,
    DATE(created_at) AS DayOffset
  FROM users
  WHERE created_at >= CURDATE() - INTERVAL 6 DAY
) as u
LEFT JOIN (
    SELECT DISTINCT
    sessions.user_id,
    DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
    FROM sessions
    LEFT JOIN users ON (users.id = sessions.user_id)
    WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
) as s
ON s.user = u.id
GROUP BY u.DayOffset
于 2016-11-09T15:35:42.443 回答
0

基于@Newy 响应的每月队列:

SELECT u.MonthOffset AS MONTH,

  SUM(s.MonthOffset = 0) AS m0,
  SUM(s.MonthOffset = 1) AS m1,
  SUM(s.MonthOffset = 2) AS m2,
  SUM(s.MonthOffset = 3) AS m3,
  SUM(s.MonthOffset = 4) AS m4,
  SUM(s.MonthOffset = 5) AS m5,
  SUM(s.MonthOffset = 6) AS m6
FROM (
 SELECT
    id,
    TIMESTAMPDIFF(month, DATE(date), CURDATE()) AS MonthOffset
  FROM users
  WHERE date >= CURDATE() - INTERVAL 6 month
) AS u
LEFT JOIN (
    SELECT DISTINCT
    user_id,
    TIMESTAMPDIFF(month, DATE(date), CURDATE()) AS MonthOffset
    FROM sessions
    WHERE sessions.date >= CURDATE() - INTERVAL 6 month
) AS s
ON s.user_id = u.id
GROUP BY u.MonthOffset;  
于 2020-01-20T16:16:36.870 回答