sql - 与 postgres 组“区别开来”

Question

我有以下记录：

id  run_hour               performance_hour      value
2  "2017-06-25 09:00:00"  "2017-06-25 07:00:00"    6
2  "2017-06-25 09:00:00"  "2017-06-25 08:00:00"    5
1  "2017-06-25 09:00:00"  "2017-06-25 08:00:00"    5
2  "2017-06-25 08:00:00"  "2017-06-25 07:00:00"    5
1  "2017-06-25 08:00:00"  "2017-06-25 07:00:00"    5

我们每小时运行一次以查看当前小时和前几个小时的每个 id 的结果。

仅当与前一小时的运行相比发生变化时，我们才插入一个新的记录（我们不想覆盖该值，因为我们想在 1 小时或 2 小时等之后测量该值。

我想对每个 id 的最新可用值求和（按 run_hour 排序） - 这些值。

在上面的示例中，9:00 运行和 7:00 运行时间的广告 1 没有记录 - 因为它与 8:00 运行和 7:00 运行时间相同

在上面的示例中，如果我询问 run 2017-06-25 09:00:00 的值的总和，我希望得到：

id, value
1   10
2   11

对于 id 1，计算为 10：(run_hour<2017-06-25 08:00:00> + run_hour<2017-06-25 09:00:00>)，对于 id 2，计算为 11：(run_hour<2017 -06-25 09:00:00> + run_hour<2017-06-25 09:00:00>) 我写了以下查询：

select distinct on (id, run_hour) id, sum(value) from metrics where  run_hour <= '2017-06-25 09:00' and performance_hour >= '2017-06-25 07:00' and  performance_hour < '2017-06-25 09:00'
group by id
order by id, run_hour

但是我收到一个错误，run_hour 也必须在 GROUP BY 子句中。- 但如果我添加它，我会得到不正确的数据 - 还有我不需要的前几个小时的数据 - 我需要有数据的最新一小时。

如何在 group by 中使用“distinct on”？

score 2 · Accepted Answer

任务非常复杂。假设您希望从以下数据中获取 7:00 到 9:00 的表演时间：

id run_hour performance_hour 值
2 "2017 年 6 月 25 日 09:00:00" "2017 年 6 月 25 日 06:00:00" 6
2 "2017-06-25 09:00:00" "2017-06-25 10:00:00" 5

预期结果将是 18（6 代表 7:00 + 6 代表 8:00 + 6 代表 9:00）全部基于 6:00 记录，该记录本身超出所需时间范围。

我们需要一个递归 CTE，从每个 id 的第一个想要的性能小时到最后一个想要的性能小时。因此，我们建立了不存在的记录，我们可以在以后总结。

with recursive cte(id, run_hour, performance_hour, value) as
(
  select *
  from
  (
    select distinct on (id) 
      id, 
      run_hour,
      greatest(performance_hour, timestamp '2017-06-25 07:00') as performance_hour, 
      value
    from metrics
    where run_hour = timestamp '2017-06-25 09:00' 
      and performance_hour <= timestamp '2017-06-25 07:00'
    order by id, metrics.performance_hour desc
  ) start_by_id
  union all
  select 
    cte.id, 
    cte.run_hour,
    cte.performance_hour + interval '1 hour' as performance_hour,
    coalesce(m.value, cte.value) as value
  from cte
  left join metrics m on m.id = cte.id
                      and m.run_hour = cte.run_hour
                      and m.performance_hour = cte.performance_hour + interval '1 hour'
  where cte.performance_hour < timestamp '2017-06-25 09:00'
)
select id, sum(value)
from cte
group by id;

Rextester 链接：http ://rextester.com/PHC88770

score 1 · Accepted Answer

你想要distinct on 之前的group by：

select id, sum(value)
from (select distinct on (id, run_hour) m.*
      from metrics m
      where run_hour <= '2017-06-25 09:00' and
            performance_hour >= '2017-06-25 07:00' and
            performance_hour < '2017-06-25 09:00'
      order by id, run_hour, performance_hour desc
     ) m
group by id;

sql - 与 postgres 组“区别开来”

2 回答 2

Related

Reference