3

我正在尝试找出一种在 SQL Server 2008 中比较两行的有效方法。我需要编写一个查询来查找Movement表中Speed < 10连续 N 次的所有行。

表的结构是:

事件时间速度

如果数据是:

2012-02-05 13:56:36.980, 2
2012-02-05 13:57:36.980, 11
2012-02-05 13:57:46.980, 2
2012-02-05 13:59:36.980, 2
2012-02-05 14:06:36.980, 22
2012-02-05 15:56:36.980, 2

然后,如果我查找 2 个连续行,它将返回第 3/4 行(13:57:46.980 / 13:59:36.980),如果我查找三个连续行,它将不返回任何内容。数据的顺序仅为 EventTime/DateTime。

你能给我的任何帮助都会很棒。我正在考虑使用游标,但它们通常效率很低。另外,这个表大约有 10m 行大小,所以越高效越好!:)

谢谢!

4

3 回答 3

5
DECLARE
  @n             INT,
  @speed_limit   INT
SELECT
  @n             = 5,
  @speed_limit   = 10

;WITH
  partitioned AS
(
  SELECT
    *,
    CASE WHEN speed < @speed_limit THEN 1 ELSE 0 END   AS PartitionID
  FROM
    Movement
)
,
  sequenced AS
(
  SELECT
    ROW_NUMBER() OVER (                         ORDER BY EventTime) AS MasterSeqID,
    ROW_NUMBER() OVER (PARTITION BY PartitionID ORDER BY EventTime) AS PartIDSeqID,
    *
  FROM
    partitioned
)
,
  filter AS
(
  SELECT
    MasterSeqID - PartIDSeqID    AS GroupID,
    MIN(MasterSeqID)             AS GroupFirstMastSeqID,
    MAX(MasterSeqID)             AS GroupFinalMastSeqID
  FROM
    sequenced
  WHERE
    PartitionID = 1
  GROUP BY
    MasterSeqID - PartIDSeqID
  HAVING
    COUNT(*) >= @n
)
SELECT
  sequenced.*
FROM
  filter
INNER JOIN
  sequenced
    ON  sequenced.MasterSeqID >= filter.GroupFirstMastSeqID
    AND sequenced.MasterSeqID <= filter.GroupFinalMastSeqID

替代的最终步骤(受@t-clausen-dk 启发),以避免额外的JOIN. 我会测试两者,看看哪个性能更高。

,
  filter AS
(
  SELECT
    MasterSeqID - PartIDSeqID                              AS GroupID,
    COUNT(*) OVER (PARTITION BY MasterSeqID - PartIDSeqID) AS GroupSize,
    *
  FROM
    sequenced
  WHERE
    PartitionID = 1
)
SELECT
  *
FROM
  filter
WHERE
  GroupSize >= @n
于 2012-10-29T12:14:56.437 回答
3
declare @t table(EventTime datetime, Speed int)
insert @t values('2012-02-05 13:56:36.980', 2)
insert @t values('2012-02-05 13:57:36.980', 11)
insert @t values('2012-02-05 13:57:46.980', 2)
insert @t values('2012-02-05 13:59:36.980', 2)
insert @t values('2012-02-05 14:06:36.980', 22)
insert @t values('2012-02-05 15:56:36.980', 2)

declare @N int = 1

;with a as
(
  select EventTime, Speed, row_number() over (order by EventTime) rn from @t
), b as
(
  select EventTime, Speed, 1 grp,  rn from a where rn = 1
  union all
  select a.EventTime, a.Speed, case when a.speed < 10 and b.speed < 10 then grp else grp + 1 end, a.rn
  from a join b on a.rn = b.rn+1
), c as
(
  select EventTime, Speed, count(*) over (partition by grp) cnt from b
)
select * from c
where cnt > @N
OPTION (MAXRECURSION 0) -- Thx Dems
于 2012-10-29T12:32:03.840 回答
3

与 Dems 几乎相同的想法,有点不同:

select * from (
 select eventtime, speed, rnk, new_rnk, 
      rnk - new_rnk,
      max(rnk) over (partition by speed, new_rnk-rnk) -
      min(rnk) over (partition by speed, new_rnk-rnk) + 1  as no_consec
  from (
     select eventtime, rnk, speed,
            row_number() over (partition by speed order by eventtime) as new_rnk
     from (
             select eventtime, speed,
             row_number() over (order by eventtime) as rnk
             from a 
          ) a
     where a.speed < 5
  )
order by eventtime
  )
where no_consec >= 2;

5 是速度限制,2 是连续事件的最小数量。为了简化创建数据库的编写,我将日期作为数字。

SQLFIDDLE

编辑:

为了回复评论,我在第一个内部查询中添加了三列。要仅获取第一行,您需要添加一个pos_in_group = 1to WHERE 子句,距离就在您的手指上。

SQLFIDDLE

select eventtime, speed, min_date, max_date, pos_in_group

from (
  select eventtime, speed, rnk, new_rnk, 
      rnk - new_rnk,
      row_number() over (partition by speed, new_rnk-rnk order by eventtime) pos_in_group,
      min(eventtime) over (partition by speed, new_rnk-rnk) min_date,
      max(eventtime) over (partition by speed, new_rnk-rnk) max_date,
      max(rnk) over (partition by speed, new_rnk-rnk) -
      min(rnk) over (partition by speed, new_rnk-rnk) + 1  as no_consec
  from (
     select eventtime, rnk, speed,
            row_number() over (partition by speed order by eventtime) as new_rnk
     from (
             select eventtime, speed,
             row_number() over (order by eventtime) as rnk
             from a 
          ) a
     where a.speed < 5
     )
  order by eventtime
  )
where no_consec > 1;
于 2012-10-29T12:45:18.243 回答