sql - Running "distinct on" across all unique thresholds in a postgres table

Question

I have a Postgres 11 table called sample_a that looks like this:

 time | cat | val
------+-----+-----
    1 |   1 |   5
    1 |   2 |   4
    2 |   1 |   6
    3 |   1 |   9
    4 |   3 |   2

I would like to create a query that for each unique timestep, gets the most recent values across each category at or before that timestep, and aggregates these values by taking the sum of these values and dividing by the count of these values.

I believe I have the query to do this for a given timestep. For example, for time 3 I can run the following query:

select sum(val)::numeric / count(val) as result from (
    select distinct on (cat) * from sample_a where time <= 3  order by cat, time desc
) x;

and get 6.5. (This is because at time 3, the latest from category 1 is 9 and the latest from category 2 is 4. The count of the values are 2, and they sum up to 13, and 13 / 2 is 6.5.)

However, I would ideally like to run a query that will give me all the results for each unique time in the table. The output of this new query would look as follows:

 time | result
------+----------
    1 |   4.5
    2 |   5
    3 |   6.5
    4 |   5

This new query ideally would avoid adding another subselect clause if possible; an efficient query would be preferred. I could get these prior results by running the prior query inside my application for each timestep, but this doesn't seem efficient for a large sample_a.

What would this new query look like?

score 1 · Accepted Answer

看看这种方式的性能是否可以接受。语法可能需要微调：

select t.time, avg(mr.val) as result
from (select distinct time from sample_a) t,
    lateral (
        select distinct on (cat) val
        from sample_a a
        where a.time <= t.time
        order by a.cat, a.time desc
    ) mr
group by t.time

score 0 · Accepted Answer

我想你只想要累积函数：

select time,
       sum(sum(val)) over (order by time) / sum(sum(num_val)) over (order by time) as result
from (select time, sum(val) as sum_val, count(*) as num_val
      from sample_a a
      group by time
     ) a;

请注意，如果val是整数，您可能需要转换为数字才能获得小数值。

这也可以在没有子查询的情况下表示：

select time,
       sum(sum(val)) over (order by time) / sum(count(*)) over (order by time) as result
from sample_a
group by time

sql - Running "distinct on" across all unique thresholds in a postgres table

2 回答 2

Related

Reference