postgresql - 选择 15 分钟窗口的数据 - PostgreSQL

Question

对，所以我在 PostgreSQL 中有一个这样的表：

timestamp              duration

2013-04-03 15:44:58    4
2013-04-03 15:56:12    2
2013-04-03 16:13:17    9
2013-04-03 16:16:30    3
2013-04-03 16:29:52    1
2013-04-03 16:38:25    1
2013-04-03 16:41:37    9
2013-04-03 16:44:49    1
2013-04-03 17:01:07    9
2013-04-03 17:07:48    1
2013-04-03 17:11:00    2
2013-04-03 17:11:16    2
2013-04-03 17:15:17    1
2013-04-03 17:16:53    4
2013-04-03 17:20:37    9
2013-04-03 17:20:53    3
2013-04-03 17:25:48    3
2013-04-03 17:29:26    1
2013-04-03 17:32:38    9
2013-04-03 17:36:55    4

我想得到以下输出：

时间戳窗口开始 = 2013-04-03 15:44:58

duration    count
1           0
2           1
3           0
4           1
9           0

时间戳窗口开始 = 2013-04-03 15:59:58

duration    count
1           0
2           0
3           0
4           0
9           1

时间戳窗口开始 = 2013-04-03 16:14:58

duration    count
1           1
2           0
3           1
4           0
9           0

时间戳窗口开始 = 2013-04-03 16:29:58

duration    count
1           2
2           0
3           0
4           0
9           1

ETC...

所以基本上它在 15 分钟窗口中循环通过时间戳，并输出不同的持续时间值及其频率（计数）。timestampwindowstart 值是窗口最早的时间戳（即timestampwindowfinish = timestampwindowstart + 15 分钟）

这样我就可以绘制 15 分钟的间隔直方图...

我已经尝试阅读，但对我来说有点复杂，而且我没有太多时间......

谢谢你的帮助！

score 4 · Accepted Answer

快速而肮脏的方式：http ://sqlfiddle.com/#!1/bd2f6/21我命名了我的专栏tstamp而不是你的timestamp

with t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
    (select duration from tmp group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmp on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

简要说明：

计算最小和最大时间戳
在最小值和最大值之间生成 15 分钟的间隔
具有唯一持续时间值的交叉连接结果
左连接原始数据（左连接很重要，因为这将在输出中保留所有可能的组合，并且在给定的时间间隔内会有null持续时间不存在的地方。
汇总数据。count(null)=0

如果您有更多表并且算法应该应用于它们的联合。假设我们有三个表tmp1, tmp2, tmp3，它们都有列tstamp和duration。我们可以扩展之前的解决方案：

with 

tmpout as (
  select * from tmp1 union all
  select * from tmp2 union all
  select * from tmp3
)

,t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
    (select duration from tmpout group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmpout on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

您应该真正了解withPostgreSQL 中的子句。对于 PostgreSQL 中的任何数据分析，它都是非常宝贵的概念。

postgresql - 选择 15 分钟窗口的数据 - PostgreSQL

1 回答 1

Related

Reference