0

我在 postgresql 9.2.4 中有一个复杂的(对我来说)SQL 查询,使用 generate_series 和多个连接。我需要汇总锻炼表中某一天所有锻炼的次数,并确保这些锻炼属于当前用户完成的锻炼。最后,我需要将该表加入一个系列以显示缺失的日期(使用 generate_series)。

我的想法是在 from 子句中选择系列,然后将系列加入到子查询中,该子查询具有练习和锻炼表之间的内部连接的结果。例如,我有以下查询:

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN workouts 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

这给出了以下输出:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-27 |   0 | {NULL}
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-29 |   0 | {NULL}
 2013-04-30 |  20 | {50}
 2013-05-01 |   0 | {NULL}
 2013-05-02 |   0 | {NULL}
 2013-05-03 |   0 | {NULL}
 2013-05-04 |   0 | {NULL}
 2013-05-05 |   0 | {NULL}
 2013-05-06 |   0 | {NULL}
 2013-05-07 |  40 | {51,51}
 2013-05-08 |   0 | {NULL}
 2013-05-09 |   0 | {NULL}
 2013-05-10 |   0 | {NULL}
 2013-05-11 |   0 | {NULL}
 2013-05-12 |   0 | {NULL}
 2013-05-13 |   0 | {NULL}
 2013-05-14 |   0 | {NULL}
 2013-05-15 |   0 | {NULL}
 2013-05-16 |  20 | {52}
 2013-05-17 |   0 | {NULL}
 2013-05-18 |   0 | {NULL}
 2013-05-19 |   0 | {NULL}
(23 rows)

但是,我想按某些条件过滤:

WHERE workouts.user_id = 5

例如。

但是,如果我将 WHERE 子句放入上面的查询中,输出是这样的:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-30 |  20 | {50}
 2013-05-07 |  40 | {51,51}
 2013-05-16 |  20 | {52}
(4 rows)

该系列消失了。

如何按 user_id 过滤并保留系列?任何帮助将非常感激。

4

2 回答 2

2

我有一个复杂的(对我而言)SQL 查询......

确实,你知道。但不一定是这样

SELECT s.day
      ,COALESCE(sum(w.reps), 0) AS sum_reps  -- assuming reps comes from workouts
      ,array_agg(e.workout_id)  AS ids
FROM   exercises e
JOIN   workouts  w ON w.id = e.workout_id AND w.user_id = 5
RIGHT  JOIN (
   SELECT now()::date + generate_series(-22, 0) AS day
   ) s ON s.day = e.created_at::date 
GROUP  BY 1
ORDER  BY 1;

要点:

  • RIGHT [OUTER] JOIN是 的逆孪生LEFT JOIN。由于连接是从左到右应用的,因此您不需要以这种方式使用括号。

  • 永远不要使用基本类型和函数名date作为标识符。我用day.

  • 更新:为避免聚合/窗口函数的结果为 NULL,请sum()使用如下所示的外部: COALESCECOALESCE(sum(reps), 0))

    sum(COALESCE(reps, 0))
  • 你根本不需要date_trunc()。这是一个date开始:

    date_trunc('day', s.day)::date AS day
  • 在这种情况下DISTINCT,您可以只使用简单的,而不是复杂且相对昂贵的组合 od + 窗口函数。GROUP BY

聚合函数和COALESCE()

最近在一些问题中对此感到困惑。

通常,sum()或其他聚合函数会忽略NULL值。结果与该值根本不存在一样。但是,也有一些特殊情况。该手册建议:

需要注意的是,除了 之外count,这些函数在没有选择行时返回空值。特别是,sumof no rows 返回 null,而不是预期的零,并且array_agg在没有输入行时返回 null 而不是空数组。必要时,该coalesce 函数可用于将零或空数组替换为 null。

这个演示应该通过演示极端情况来澄清:

  • 1 个没有行的表。
  • 3 表 1 行持有 ( NULL/ 0/ 1)
  • 3 表 2 行持有NULL和 ( NULL/ 0/ 1)

测试设置

-- no rows
CREATE TABLE t_empty (i int);
-- INSERT nothing

CREATE TABLE t_0 (i int);
CREATE TABLE t_1 (i int);
CREATE TABLE t_n (i int);

-- 1 row
INSERT INTO t_0 VALUES (0);
INSERT INTO t_1 VALUES (1);
INSERT INTO t_n VALUES (NULL);

CREATE TABLE t_0n (i int);
CREATE TABLE t_1n (i int);
CREATE TABLE t_nn (i int);

-- 2 rows
INSERT INTO t_0n VALUES (0),    (NULL);
INSERT INTO t_1n VALUES (1),    (NULL);
INSERT INTO t_nn VALUES (NULL), (NULL);

询问

SELECT 't_empty'           AS tbl
      ,count(*)            AS ct_all
      ,count(i)            AS ct_i
      ,sum(i)              AS simple_sum
      ,sum(COALESCE(i, 0)) AS inner_coalesce
      ,COALESCE(sum(i), 0) AS outer_coalesce
FROM   t_empty

UNION ALL
SELECT 't_0',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0
UNION ALL
SELECT 't_1',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1
UNION ALL
SELECT 't_n',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_n

UNION ALL
SELECT 't_0n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0n
UNION ALL
SELECT 't_1n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1n
UNION ALL
SELECT 't_nn', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_nn;

结果

   tbl   | ct_all | ct_i | simple_sum | inner_coalesce | outer_coalesce
---------+--------+------+------------+----------------+----------------
 t_empty |      0 |    0 |     <NULL> |         <NULL> |              0
 t_0     |      1 |    1 |          0 |              0 |              0
 t_1     |      1 |    1 |          1 |              1 |              1
 t_n     |      1 |    0 |     <NULL> |              0 |              0
 t_0n    |      2 |    1 |          0 |              0 |              0
 t_1n    |      2 |    1 |          1 |              1 |              1
 t_nn    |      2 |    0 |     <NULL> |              0 |              0

-> SQL小提琴

因此,我最初的建议是草率的。您可能需要 COALESCEsum().
但如果你这样做,请使用外部 COALESCE. 原始查询中的内部COALESCE不涵盖所有极端情况,并且很少有用。

于 2013-05-20T21:20:14.400 回答
1

而不是从 WORKOUTS 表中获取所有数据,您可以将这个条件放在那里也作为 -

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN (select * from workouts where user_id = 5) workout 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

我认为这应该为您提供所需的输出。

于 2013-05-20T19:41:14.583 回答