1

如何stream在选择中每 N 分钟对pipelinedb 中的数据进行分组continuous view

Pipelinedb 的流获取有关来自许多远程主机的事件的数据。例如,我需要按类型、IP 和 5 分钟内的时间间隔对这些事件进行分组,并计算它们。

所以在输入我有(非常粗略):

time  | ip               | type      
------------------------------------
22:35 | 111.111.111.111  | page_open <-- new interaval, ends in 22:40
22:36 | 111.111.111.111  | page_open
22:37 | 111.111.111.111  | page_close 
22:42 | 111.111.111.111  | page_close <-- event comes in next interval, ends in 22:45
22:42 | 222.111.111.111  | page_open 
22:43 | 222.111.111.111  | page_open
22:44 | 222.111.111.111  | page_close 
22:44 | 111.111.111.111  | page_open

并且必须在连续视图中选择:

time  | ip               | type       | count
---------------------------------------------
22:40 | 111.111.111.111  | page_open  | 2
22:40 | 111.111.111.111  | page_close | 1
22:45 | 111.111.111.111  | page_open  | 1
22:45 | 111.111.111.111  | page_close | 1
22:45 | 222.111.111.111  | page_open  | 2
22:45 | 222.111.111.111  | page_close | 1

ps对不起我的英语

4

1 回答 1

0

为此,您可以使用date_round(column, interval)[0] 函数。例如,

CREATE CONTINUOUS VIEW bucketed AS
  SELECT date_round(time, '5 minutes') AS bucket, COUNT(*)
    FROM input_stream GROUP BY bucket;

[0] http://docs.pipelinedb.com/builtin.html?highlight=date_round

于 2016-12-30T20:18:26.893 回答