mysql - 无论如何，如何在结果集中包含零消息的日期？

Question

我有下表的消息：

+---------+---------+------------+----------+
| msg_id  | user_id | m_date     |  m_time  |
+-------------------+------------+----------+
|   1     | 1       | 2011-01-22 | 06:23:11 |
|   2     | 1       | 2011-01-23 | 16:17:03 |
|   3     | 1       | 2011-01-23 | 17:05:45 |
|   4     | 2       | 2011-01-22 | 23:58:13 |
|   5     | 2       | 2011-01-23 | 23:59:32 |
|   6     | 2       | 2011-01-24 | 21:02:41 |
|   7     | 3       | 2011-01-22 | 13:45:00 |
|   8     | 3       | 2011-01-23 | 13:22:34 |
|   9     | 3       | 2011-01-23 | 18:22:34 |
|  10     | 3       | 2011-01-24 | 02:22:22 |
|  11     | 3       | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+

我想要的是每天，查看每个用户在 16:00 之前和之后发送了多少条消息：

SELECT 
    user_id, 
    m_date, 
    SUM(m_time <= '16:00') AS before16, 
    SUM(m_time > '16:00') AS after16 
FROM messages 
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC

这会产生：

user_id m_date      before16  after16
-------------------------------------
1       2011-01-22  1         0
1       2011-01-23  0         2
2       2011-01-22  0         1
2       2011-01-23  0         1
2       2011-01-24  0         1
3       2011-01-22  1         0
3       2011-01-23  1         1
3       2011-01-24  2         0

因为用户 1 在 2011 年 1 月 24 日没有写任何消息，所以这个日期不在结果集中。然而，这是不希望的。我的数据库中有第二个表，名为“date_range”：

+---------+------------+
| date_id | d_date     |
+---------+------------+
| 1       | 2011-01-21 |
| 1       | 2011-01-22 |
| 1       | 2011-01-23 |
| 1       | 2011-01-24 |
+---------+------------+

我想对照这张表检查“消息”。对于每个用户，所有这些日期都必须在结果集中。如您所见，没有一个用户在 2011-01-21 上写过消息，如前所述，用户 1 在 2011-01-24 上没有任何消息。查询的期望输出将是：

user_id d_date      before16  after16
-------------------------------------
1       2011-01-21  0         0
1       2011-01-22  1         0
1       2011-01-23  0         2
1       2011-01-24  0         0
2       2011-01-21  0         0
2       2011-01-22  0         1
2       2011-01-23  0         1
2       2011-01-24  0         1
3       2011-01-21  0         0
3       2011-01-22  1         0
3       2011-01-23  1         1
3       2011-01-24  2         0

如何链接这两个表，以便查询结果还包含 before16 和 after16 的零值行？

编辑：是的，我有一个“用户”表：

+---------+------------+
| user_id | user_date  |
+---------+------------+
| 1       | foo        |
| 2       | bar        |
| 3       | foobar     |
+---------+------------+

score 2 · Accepted Answer

Test bed:

create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
       (1,1,'2011-01-22','06:23:11'),
       (2,1,'2011-01-23','16:17:03'),
       (3,1,'2011-01-23','17:05:05');
insert into date_range values
       (1, '2011-01-21'),
       (1, '2011-01-22'),
       (1, '2011-01-23'),
       (1, '2011-01-24');

Query:

SELECT p._date, p.user_id,
       coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
  FROM
      (SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
  LEFT JOIN
      (SELECT user_id, _date,
              SUM(_time <= '16:00') AS before16,
              SUM(_time > '16:00') AS after16 
         FROM messages 
        GROUP BY user_id, _date
        ORDER BY user_id, _date ASC) m
    ON p.user_id = m.user_id AND p._date = m._date;

EDIT:

Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.

This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.

I hope this clarifies how this query works.

score 2 · Accepted Answer

试一试：

select u.user_id, u._date,
    sum(_time <= '16:00') as before16,
    sum(_time > '16:00') as after16
from (
    select m.user_id, d._date
    from messages m
        cross join date_range d
    group by m.user_id, d._date
    ) u
    left join messages m on u.user_id=m.user_id
                        and u._date=m._date
group by u.user_id, u._date

内部查询只是构建一组所有可能/所需的用户日期对。使用 users 表会更有效，但你没有提到你有一个，所以我不会假设。否则，您只需要不left join删除未加入的记录。

编辑 ——更详细的解释：将查询分开。

从最里面的查询开始；目标是为每个用户获取所有所需日期的列表。由于有一个用户表和一个日期表，它看起来像这样：

select distinct u.user_id, d.d_date
from users u
  cross join date_range d

这里的关键是cross join，获取表中的users每一行并将其与表中的每一行相关联date_range。distinct关键字实际上只是所有列上的 a 的简写，这里group by是为了以防有重复数据。

请注意，还有其他几种方法可以获取相同的结果集（例如在我的原始查询中），但从逻辑和计算的角度来看，这可能是最简单的。

实际上，唯一的其他步骤是添加left join（将我们上面得到的所有行与所有可用数据相关联，并且不删除任何没有任何数据的内容）以及group by与select之前基本相同的组件。所以，把所有东西放在一起看起来像这样：

select t.user_id, t.d_date,
  sum(m.m_time <= '16:00') as before16,
  sum(m.m_time > '16:00') as after16
from (
    select distinct u.user_id, d.d_date
    from users u
      cross join date_range d
  ) t
  left join messages m on t.user_id = m.user_id
                      and t.d_date = m.m_date
group by t.user_id, t.d_date

基于其他一些评论/问题，请注意对所有表和子查询的所有使用都显式使用前缀（这非常简单，因为我们不再多次使用任何表）：u对于users表，d对于date_range表，t用于包含用于每个用户的日期的子查询，以及m用于message表。这可能是我的第一个解释有点短的地方，因为我两次使用消息表，两次都使用相同的前缀。由于两种用途的上下文（一个在子查询中），它在那里工作，但它可能不是最佳实践。

score 1 · Accepted Answer

它不整洁。但如果你有一张user桌子。然后可能是这样的：

SELECT 
    user_id, 
    _date, 
    SUM(_time <= '16:00') AS before16, 
    SUM(_time > '16:00') AS after16 
FROM messages 
GROUP BY user_id, _date
UNION
SELECT
    user_id,
    date_range,
    0 AS before16, 
    0 AS after16 
FROM
    users,
    date_range
ORDER BY user_id, _date ASC

score 0 · Accepted Answer

chezy525 的解决方案效果很好，我将它移植到 postgresql 并删除/重命名了一些别名：

select users_and_dates.user_id, users_and_dates._date,
    SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
    SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
    select messages.user_id, date_range._date
    from messages 
        cross join date_range 
    group by messages.user_id, date_range._date
    ) users_and_dates
    left join messages  on users_and_dates.user_id=messages.user_id
                    and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;

并在我的机器上运行，完美运行

mysql - 无论如何，如何在结果集中包含零消息的日期？

4 回答 4

Related

Reference