sql - 在 SQL 中仅保存唯一的数据点

Question

为简单起见：我们有一个有 2 列的表，value并且date.

每秒都会收到一个新数据，我们想用它的时间戳保存它。由于数据可以相似，为了降低使用率，如果数据与之前的条目相同，我们不保存它。

问题：鉴于在 24 小时内收到相同的值，只保存第一个value&date对。如果我们想查询“过去 1 小时的平均值”，有没有办法让数据库（PostgreSQL）看到过去一小时没有保存任何值并搜索最后一个现有值条目？

score 1 · Accepted Answer

这并不像看起来那么容易，而且不仅仅是在过去一小时内没有可用数据点时检索最新数据点。您想要计算平均值，因此您需要重建该时期的时间序列数据，每秒秒数，用最新的可用数据点填补空白。

我认为最简单的方法是generate_series()构建行，然后进行横向连接以恢复数据：

select avg(d.value) avg_last_hour
from generate_series(now() - interval '1 hour', now(), '1 second') t(ts)
cross join lateral (
    select d.*
    from data d
    where d.date <= t.ts
    order by d.date desc
    limit 1
) t

score 0 · Accepted Answer

Hmmm . . . if you simply want the average of values in the most recent hour in the data, you can use:

select date_trunc('hour', date) as ddhh, avg(value)
from t
group by ddhh
order by ddhh desc
limit 1;

If you have a lot of data being collected, it might be faster to add an index on date and use:

select avg(value)
from t
where date >= date_trunc('hour', (select max(t2.date) from t t2));

sql - 在 SQL 中仅保存唯一的数据点

2 回答 2

Related

Reference