1

我有这样一张桌子的一部分:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:28.775248 | Poll
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

我想实现这一点:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

原始表中的第二行已被删除,因为它在它之前或之后的时间戳的 50 毫秒内。重要的是仅在 Source = 'Poll' 时删除行。

不确定如何使用 WHERE 子句来实现这一点?

提前感谢您的帮助。

4

1 回答 1

0

无论我们做什么,我们都可以将其限制为 Pools,然后将这些行与 Streams 合并。

with 
streams as (
 select *
 from test 
 where Source = 'Stream'  
),
pools as (
  ...
)

(select * from pools) union (select * from streams) order by timestamp

要获得池,有不同的选择:

相关子查询

对于每一行,我们运行额外的查询以获取具有相同源的前一行,然后仅选择没有前一个时间戳(第一行)或前一个时间戳超过 50 毫秒的那些行。

with 
...
pools_with_prev as (
  -- use correlated subquery
  select 
    timestamp, Source, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    (select max(t2.timestamp)from test as t2 
      where t2.timestamp < test.timestamp and
     t2.Source = test.Source) 
      as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)

...

https://www.db-fiddle.com/f/iVgSkvTVpqjNZ5F5RZVSd2/2

加入两个滑动表

代替为每一行运行子查询,我们可以创建表的副本并滑动它,以便每个 Pool 行与相同源类型的前一行连接。

with 
...
pools_rn as (
 -- add extra row number column
 -- rows: 1, 2, 3
 select *,
  row_number() over (order by timestamp) as rn
 from test
 where Source = 'Pool'  
),
pools_rn_prev as (
 -- add extra row number column increased by one
 -- like sliding a copy of the table one row down
 -- rows: 2, 3, 4
 select timestamp as timestamp_prev,
  row_number() over (order by timestamp)+1 as rn
 from test
 where Source = 'Pool'  
),
pools as (
 -- now join prev two tables on this column
 -- each row will join with its predecessor
 select timestamp, source 
 from pools_rn
  left outer join pools_rn_prev
  on pools_rn.rn = pools_rn_prev.rn
 where 
  -- then select rows which are >50ms apart
  timestamp_prev is null or
  timestamp - interval '00:00:00.05' > timestamp_prev
)

...

https://www.db-fiddle.com/f/gXmSxbqkrxpvksE8Q4ogEU/2

滑动窗口

现代 SQL 可以做类似的事情,按源分区,然后使用滑动窗口与前一行连接。

with 
...
pools_with_prev as (
  -- use sliding window to join prev timestamp
  select *, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    lag(timestamp) over(
      partition by Source order by timestamp
    ) as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)


...

https://www.db-fiddle.com/f/8KfTyqRBU62SFSoiZfpu6Q/1

我相信这是最优化的。

于 2017-09-16T01:01:36.653 回答