I have a Postgres 11 table called sample_a
that looks like this:
time | cat | val
------+-----+-----
1 | 1 | 5
1 | 2 | 4
2 | 1 | 6
3 | 1 | 9
4 | 3 | 2
I would like to create a query that for each unique timestep, gets the most recent values across each category at or before that timestep, and aggregates these values by taking the sum of these values and dividing by the count of these values.
I believe I have the query to do this for a given timestep. For example, for time 3
I can run the following query:
select sum(val)::numeric / count(val) as result from (
select distinct on (cat) * from sample_a where time <= 3 order by cat, time desc
) x;
and get 6.5
. (This is because at time 3
, the latest from category 1
is 9
and the latest from category 2
is 4
. The count of the values are 2
, and they sum up to 13
, and 13
/ 2
is 6.5
.)
However, I would ideally like to run a query that will give me all the results for each unique time in the table. The output of this new query would look as follows:
time | result
------+----------
1 | 4.5
2 | 5
3 | 6.5
4 | 5
This new query ideally would avoid adding another subselect clause if possible; an efficient query would be preferred. I could get these prior results by running the prior query inside my application for each timestep, but this doesn't seem efficient for a large sample_a
.
What would this new query look like?