1

我有一张桌子:

create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');

我想构造一个语句,给定一个事件可以返回从该事件开始的事件“运行”的长度。运行定义为:

  1. 如果两个事件的间隔在 30 秒内,则两个事件一起运行。
  2. 如果A和B一起跑,B和C一起跑,那么A和C一起跑。

但是我的查询不需要及时倒退,所以如果我在事件 2 上进行选择,那么只有事件 2、3 和 4 应计为以 2 开头的事件运行的一部分,并且应返回 3 作为运行的长度。

有任何想法吗?我难住了。

4

3 回答 3

1

可能看起来像这样:

WITH x AS (
    SELECT event_time
          ,row_number() OVER w AS rn
          ,lead(event_time) OVER w AS next_time
    FROM   table1
    WHERE  event_id >= <start_id>
    WINDOW w AS (ORDER BY event_time, event_id)
    )
SELECT COALESCE(
      (SELECT x.rn
       FROM   x
       WHERE  (x.event_time + interval '30s') < x.next_time
       ORDER  BY x.rn
       LIMIT  1)
     ,(SELECT count(*) FROM x)
      ) AS run_length

此版本不依赖于无间隙的 ID 序列,而event_time仅依赖于。
Identicalevent_time的另外排序event_id是明确的。

阅读手册中的窗口函数 row_number()CTE(With 子句)lead()

编辑

如果我们不能假设一个更大的event_id有一个较晚的(或相等的)event_time,用这个代替第一个WHERE子句:

WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)

与起始行相同event_time但 aa 更小的行event_id仍将被忽略。

一次运行直到结束的特殊情况下,没有找到结束并且没有返回行。COALESCE而是返回所有行的计数。

于 2011-11-23T18:09:24.137 回答
1

您可以在日期差异声明中将表连接到自身上。实际上,这是 postgres,一个简单的减号就可以了。

该子查询将查找所有属于“开始事件”的记录。也就是说,在它之前的 30 秒内没有发生其他事件记录的所有事件记录:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) startevent

进行一些更改...相同的逻辑,除了选择“结束”事件:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event

现在我们可以将它们连接在一起以关联哪个开始事件到哪个结束事件:

(仍在写...有几种方法可以解决这个问题。我假设只有示例具有线性 ID 号,因此您需要将开始事件时间加入到结束事件时间中,在活动时间)。

这是我的最终结果......有点嵌套了很多子选择

 select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single  event' end as end_id
 from
 (select start_event.event_id as start_id, start_event.event_time as start_time,      last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval   
 from
    (Select a.event_id, a.event_time from
    (Select event_id, event_time from table1) a
     left join 
    (select event_id, event_time from table1) b
   on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) start_event

inner join

   (Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event     
on end_event.event_time > start_event.event_time

--check for only event
 left join
 (Select a.event_id, a.event_time from
 (Select event_id, event_time from table1) a
  left join 
  (select event_id, event_time from table1) b
  on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
  where b.event_time is null) last_event
    on start_event.event_id = last_event.event_id
group by 1,2,3) a
    left join table1 t1 on t1.event_time = a.start_time + a.min_interval

结果为 start_id、end_Id:

1;"4"
5;"6"
7;"单个事件"
8;"单个事件"
9;"11"

我不得不使用第三个左连接来挑选单个事件作为检测既是开始事件又是结束事件的事件的方法。最终结果在 ID 中,如果您需要与 ID 不同的信息,可以将其链接回原始表格。如果您有数百万个事件,不确定此解决方案将如何扩展……可能是个问题。

于 2011-11-23T19:09:49.433 回答
1

这是递归 CTE 解决方案。(孤岛和间隙问题自然适用于递归 CTE)

WITH RECURSIVE runrun AS (
    SELECT event_id, event_time
    , event_time - ('30 sec'::interval) AS low_time
    , event_time + ('30 sec'::interval) AS high_time
    FROM table1
    UNION
    SELECT t1.event_id, t1.event_time
    , LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
    , GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
    FROM table1 t1
    JOIN runrun rr ON t1.event_time >= rr.low_time
                  AND t1.event_time < rr.high_time
    )
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
    ;

结果:

 event_id |     event_time      |      low_time       |      high_time      
----------+---------------------+---------------------+---------------------
        2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
        3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
        4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)
于 2011-11-23T22:42:32.030 回答