sql - SQL 从 generate_series 中选择，按 user_id 过滤删除系列？

Question

我在 postgresql 9.2.4 中有一个复杂的（对我来说）SQL 查询，使用 generate_series 和多个连接。我需要汇总锻炼表中某一天所有锻炼的次数，并确保这些锻炼属于当前用户完成的锻炼。最后，我需要将该表加入一个系列以显示缺失的日期（使用 generate_series）。

我的想法是在 from 子句中选择系列，然后将系列加入到子查询中，该子查询具有练习和锻炼表之间的内部连接的结果。例如，我有以下查询：

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN workouts 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

这给出了以下输出：

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-27 |   0 | {NULL}
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-29 |   0 | {NULL}
 2013-04-30 |  20 | {50}
 2013-05-01 |   0 | {NULL}
 2013-05-02 |   0 | {NULL}
 2013-05-03 |   0 | {NULL}
 2013-05-04 |   0 | {NULL}
 2013-05-05 |   0 | {NULL}
 2013-05-06 |   0 | {NULL}
 2013-05-07 |  40 | {51,51}
 2013-05-08 |   0 | {NULL}
 2013-05-09 |   0 | {NULL}
 2013-05-10 |   0 | {NULL}
 2013-05-11 |   0 | {NULL}
 2013-05-12 |   0 | {NULL}
 2013-05-13 |   0 | {NULL}
 2013-05-14 |   0 | {NULL}
 2013-05-15 |   0 | {NULL}
 2013-05-16 |  20 | {52}
 2013-05-17 |   0 | {NULL}
 2013-05-18 |   0 | {NULL}
 2013-05-19 |   0 | {NULL}
(23 rows)

但是，我想按某些条件过滤：

WHERE workouts.user_id = 5

例如。

但是，如果我将 WHERE 子句放入上面的查询中，输出是这样的：

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-30 |  20 | {50}
 2013-05-07 |  40 | {51,51}
 2013-05-16 |  20 | {52}
(4 rows)

该系列消失了。

如何按 user_id 过滤并保留系列？任何帮助将非常感激。

score 2 · Accepted Answer

我有一个复杂的（对我而言）SQL 查询......

确实，你知道。但不一定是这样：

SELECT s.day
      ,COALESCE(sum(w.reps), 0) AS sum_reps  -- assuming reps comes from workouts
      ,array_agg(e.workout_id)  AS ids
FROM   exercises e
JOIN   workouts  w ON w.id = e.workout_id AND w.user_id = 5
RIGHT  JOIN (
   SELECT now()::date + generate_series(-22, 0) AS day
   ) s ON s.day = e.created_at::date 
GROUP  BY 1
ORDER  BY 1;

要点：

RIGHT [OUTER] JOIN是的逆孪生LEFT JOIN。由于连接是从左到右应用的，因此您不需要以这种方式使用括号。
永远不要使用基本类型和函数名date作为标识符。我用day.
更新：为避免聚合/窗口函数的结果为 NULL，请sum()使用如下所示的外部： COALESCECOALESCE(sum(reps), 0))
```
sum(COALESCE(reps, 0))
```
你根本不需要date_trunc()。这是一个date开始：
```
date_trunc('day', s.day)::date AS day
```
在这种情况下DISTINCT，您可以只使用简单的，而不是复杂且相对昂贵的组合 od + 窗口函数。GROUP BY

聚合函数和`COALESCE()`

最近在一些问题中对此感到困惑。

通常，sum()或其他聚合函数会忽略NULL值。结果与该值根本不存在一样。但是，也有一些特殊情况。该手册建议：

需要注意的是，除了之外count，这些函数在没有选择行时返回空值。特别是，sumof no rows 返回 null，而不是预期的零，并且array_agg在没有输入行时返回 null 而不是空数组。必要时，该coalesce 函数可用于将零或空数组替换为 null。

这个演示应该通过演示极端情况来澄清：

1 个没有行的表。
3 表 1 行持有 ( NULL/ 0/ 1)
3 表 2 行持有NULL和 ( NULL/ 0/ 1)

测试设置

-- no rows
CREATE TABLE t_empty (i int);
-- INSERT nothing

CREATE TABLE t_0 (i int);
CREATE TABLE t_1 (i int);
CREATE TABLE t_n (i int);

-- 1 row
INSERT INTO t_0 VALUES (0);
INSERT INTO t_1 VALUES (1);
INSERT INTO t_n VALUES (NULL);

CREATE TABLE t_0n (i int);
CREATE TABLE t_1n (i int);
CREATE TABLE t_nn (i int);

-- 2 rows
INSERT INTO t_0n VALUES (0),    (NULL);
INSERT INTO t_1n VALUES (1),    (NULL);
INSERT INTO t_nn VALUES (NULL), (NULL);

询问

SELECT 't_empty'           AS tbl
      ,count(*)            AS ct_all
      ,count(i)            AS ct_i
      ,sum(i)              AS simple_sum
      ,sum(COALESCE(i, 0)) AS inner_coalesce
      ,COALESCE(sum(i), 0) AS outer_coalesce
FROM   t_empty

UNION ALL
SELECT 't_0',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0
UNION ALL
SELECT 't_1',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1
UNION ALL
SELECT 't_n',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_n

UNION ALL
SELECT 't_0n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0n
UNION ALL
SELECT 't_1n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1n
UNION ALL
SELECT 't_nn', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_nn;

结果

   tbl   | ct_all | ct_i | simple_sum | inner_coalesce | outer_coalesce
---------+--------+------+------------+----------------+----------------
 t_empty |      0 |    0 |     <NULL> |         <NULL> |              0
 t_0     |      1 |    1 |          0 |              0 |              0
 t_1     |      1 |    1 |          1 |              1 |              1
 t_n     |      1 |    0 |     <NULL> |              0 |              0
 t_0n    |      2 |    1 |          0 |              0 |              0
 t_1n    |      2 |    1 |          1 |              1 |              1
 t_nn    |      2 |    0 |     <NULL> |              0 |              0

-> SQL小提琴

因此，我最初的建议是草率的。您可能需要 COALESCE与sum().
但如果你这样做，请使用外部 COALESCE. 原始查询中的内部COALESCE不涵盖所有极端情况，并且很少有用。

score 1 · Accepted Answer

而不是从 WORKOUTS 表中获取所有数据，您可以将这个条件放在那里也作为 -

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN (select * from workouts where user_id = 5) workout 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

我认为这应该为您提供所需的输出。

sql - SQL 从 generate_series 中选择，按 user_id 过滤删除系列？

2 回答 2

要点：

聚合函数和COALESCE()

测试设置

询问

结果

Related

Reference

聚合函数和`COALESCE()`