9

目前我有一张这样建立的桌子

DeviceID      Timestamp            Value
----------------------------------------
Device1       1.1.2011 10:00:00    3
Device1       1.1.2011 10:00:01    4
Device1       1.1.2011 10:00:02    4
Device1       1.1.2011 10:00:04    3
Device1       1.1.2011 10:00:05    4
Device1       1.1.2011 14:23:14    8
Device1       1.1.2011 14:23:15    7
Device1       1.1.2011 14:23:17    4
Device1       1.1.2011 14:23:18    2

如您所见,来自具有给定时间戳的设备的一些值(列类型为日期时间)。

问题是设备可以在任何时候启动和停止,并且在数据中没有发生启动或停止的直接信息。但是从给定时间戳的列表中很容易判断何时开始和停止发生,因为只要两行的时间戳在五秒内,它们就属于同一个度量。

现在我想从这些数据中得到一个像这样的列表:

DeviceID      Started              Ended
Device1       1.1.2011 10:00:00    1.1.2011 10:00:05
Device1       1.1.2011 14:23:14    1.1.2011 14:23:18

那么有什么想法可以快速做到这一点吗?我所能想到的就是使用某种游标并手动比较每个日期时间对。但我认为这会变得非常慢,因为我们必须检查每一行中的每个值。

那么有没有更好的 SQL 解决方案不适用于游标?

更新

目前我测试了所有给定的答案。通过阅读,它们看起来都很好,并且有一些有趣的方法。不幸的是,所有这些(到目前为止)在真实数据上都失败了。最大的问题似乎是数据的海量(目前它们在表中大约有 350 万个条目)。仅对一小部分子集执行给定查询会导致预期结果,但将查询滚动到整个表只会导致非常糟糕的性能。

我必须进一步测试和检查我是否可以将数据分块,并且只将部分数据传递给这些给定算法之一以使这件事滚动起来。但也许你们中的一个人有另一个聪明的想法,可以更快地获得结果。

更新(有关结构的更多信息)

好的,这些信息也可能有所帮助:目前表中大约有 350 万个条目。以下是给定的列类型和索引:

  • _ID
    • 整数
    • 首要的关键
    • 分组索引
    • 在我的示例中没有提到此列,因为此查询不需要它
  • 设备ID
    • 整数
    • 不为空
    • 指数
  • 时间戳
    • 约会时间
    • 不为空
    • 指数
  • 价值
    • 几个不同类型的未索引列(int、real、tinyint)
    • 都可以为空

也许这有助于改进您对给定问题的已有(或新)解决方案。

4

7 回答 7

2
-- Table var to store the gaps
declare @T table
(
  DeviceID varchar(10),
  PrevPeriodEnd datetime,
  NextPeriodStart datetime
)

-- Get the gaps
;with cte as 
(
  select *,
    row_number() over(partition by DeviceID order by Timestamp) as rn
  from data
)
insert into @T
select
  C1.DeviceID,
  C1.Timestamp as PrevPeriodEnd,
  C2.Timestamp as NextPeriodStart
from cte as C1
  inner join cte as C2
    on C1.rn = C2.rn-1 and
       C1.DeviceID = C2.DeviceID and
       datediff(s, C1.Timestamp, C2.Timestamp) > 5

-- Build islands from gaps in @T
;with cte1 as
(
  -- Add first and last timestamp to gaps
  select DeviceID, PrevPeriodEnd, NextPeriodStart
  from @T
  union all
  select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
  from data
  group by DeviceID
  union all
  select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
  from data
  group by DeviceID
),
cte2 as
(
  select *,
    row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
  from cte1
)
select
  C1.DeviceID,
  C1.NextPeriodStart as PeriodStart,
  C2.PrevPeriodEnd as PeriodEnd
from cte2 as C1
  inner join cte2 as C2
    on C1.DeviceID = C2.DeviceID and
       C1.rn = C2.rn-1
order by C1.DeviceID, C1.NextPeriodStart       
于 2011-05-18T10:24:06.330 回答
0

尝试这个:

select DeviceID,MIN(Timestamp),MAX(Timestamp) 
          from @table group by DATEPART(hh,Timestamp),DeviceID
于 2011-05-16T14:42:57.377 回答
0

我玩过一些数据类型和名称(只是因为我可以,而且因为时间戳是保留字),并且可以使用您的示例数据获得您请求的结果。

样本数据:

create table Measures (
    DeviceID int not null,
    Occurred datetime not null,
    Value int not null,
    constraint PK_Measures PRIMARY KEY (DeviceID,Occurred)
)
go
insert into Measures (DeviceID,Occurred,Value)
select 1,'2011-01-01T10:00:00',3 union all
select 1,'2011-01-01T10:00:01',4 union all
select 1,'2011-01-01T10:00:02',4 union all
select 1,'2011-01-01T10:00:04',3 union all
select 1,'2011-01-01T10:00:05',4 union all
select 1,'2011-01-01T14:23:14',8 union all
select 1,'2011-01-01T14:23:15',7 union all
select 1,'2011-01-01T14:23:17',4 union all
select 1,'2011-01-01T14:23:18',2

现在查询:

;with StartPeriods as (
    select m1.DeviceID,m1.Occurred as Started
    from Measures m1 left join Measures m2 on m1.DeviceID = m2.DeviceID and m2.Occurred < m1.Occurred and DATEDIFF(second,m2.Occurred,m1.Occurred) < 6
    where m2.DeviceID is null
), ExtendPeriods as (
    select DeviceID,Started,Started as Ended from StartPeriods
    union all
    select
        ep.DeviceID,ep.Started,m2.Occurred
    from
        ExtendPeriods ep
            inner join
        Measures m2
            on
                ep.DeviceID = m2.DeviceID and
                ep.Ended < m2.Occurred and
                DATEDIFF(SECOND,ep.Ended,m2.Occurred) < 6
)
select DeviceID,Started,MAX(Ended) from ExtendPeriods group by DeviceID,Started

公用表StartPeriods表达式 (CTE) 从度量表中查找在 5 秒内没有前一行的行。然后,ExtendPeriodsCTE 通过从度量中查找新行来递归地扩展这些周期,这些新行在所找到的周期当前结束后最多 5 秒内发生。

然后,我们找到期末距起点尽可能远的行。

于 2011-05-16T14:46:31.823 回答
0

试试这个,虽然我不确定它在处理大量数据时的表现如何

SELECT a.TS AS [StartTime], (SELECT TOP 1 c.TS FROM TestTime c WHERE c.TS >= a.TS AND
    NOT EXISTS(SELECT * FROM TestTime d WHERE d.TS > c.TS AND DATEDIFF(SECOND, c.TS, d.TS) <= 5) ORDER BY c.TS) AS [StopTime]
FROM TestTime a WHERE NOT EXISTS (SELECT * FROM TestTime b WHERE a.TS > b.TS AND DATEDIFF(SECOND, b.TS, a.TS) <= 5)

我的表称为 TestTime,列称为 TS,因此请为您的表调整它。我已经使用 NOT EXISTS 来检查时间戳 < 当前记录并在其 5 秒内 - 如果未找到,则显示,即开始时间(或表中的第一条记录,然后它将查找最低时间戳大于发现的任何记录> =该时间戳(如果它是单个条目,因此是开始/停止一个)并且再次使用 NOT EXISTS 在 5 秒内检查大于它的记录 - 所以, 再一次, 如果没有找到记录(只有第一个). 你可以调整和改进它, 但它可能是一个很好的基础。

请注意,如果它仍在运行,它将列出最后一次找到的时间作为最后一次启动事件的停止时间。

为简单起见,我没有在此处输入设备名称,因此您需要将其放入 StopTime 和 WHERE 子句中

于 2011-05-17T15:14:15.343 回答
0
DECLARE @t TABLE
(DeviceID      VARCHAR(10),
 [Timestamp]    DATETIME,
 VALUE          INT
)

INSERT @t
SELECT 'Device1','20110101 10:00:00',    3
UNION SELECT 'Device1','20110101 10:00:01',    4
UNION SELECT 'Device1','20110101 10:00:02',    4
UNION SELECT 'Device1','20110101 10:00:04',   3
UNION SELECT 'Device1','20110101 10:00:05',    4
UNION SELECT 'Device1','20110101 14:23:14',    8
UNION SELECT 'Device1','20110101 14:23:15',    7
UNION SELECT 'Device1','20110101 14:23:17',    4
UNION SELECT 'Device1','20110101 14:23:18',    2


;WITH myCTE
AS
(
    SELECT DeviceID, [Timestamp],
           ROW_NUMBER() OVER (PARTITION BY DeviceID
                              ORDER BY [TIMESTAMP]
                             ) AS rn
    FROM @t
)
, recCTE
AS
(
    SELECT DeviceID, [Timestamp],  0 as groupID, rn FROM myCTE
    WHERE rn = 1

    UNION ALL

    SELECT r.DeviceID, g.[Timestamp],  CASE WHEN DATEDIFF(ss,r.[Timestamp], g.[Timestamp]) <= 5 THEN r.groupID ELSE r.groupID + 1 END, g.rn 
    FROM recCTE AS r
    JOIN myCTE AS g
    ON g.rn = r.rn + 1
)
SELECT DeviceID, MIN([Timestamp]) AS [started], MAX([Timestamp]) AS ended
FROM recCTE
GROUP BY DeviceId, groupId
OPTION (MAXRECURSION 0);
于 2011-05-16T14:51:42.057 回答
0

您应该能够为此使用窗口函数(假设 15 分钟在下面定义了一个新会话):

SELECT DeviceId,
       Timestamp,
       COALESCE((Timestamp - lag(Timestamp) OVER w) > interval '15 min', TRUE)
       as session_begins
       COALESCE((lead(Timestamp) OVER w - Timestamp) > interval '15 min', TRUE)
       as session_ends
FROM YourTable
WINDOW w AS (PARTITION BY DeviceId ORDER BY Timestamp);

根据您的 where 子句,您可能希望删除 coalesce/true 部分,因为获取的第一行/最后一行可能会变得无效。

如果您只需要边界,则可以在子查询中使用上述内容,并且group by DeviceId, session_begins, session_ends having session_begins or session_ends. 此外,如果您这样做,请不要忘记将 where 子句放在子查询中,而不是在主查询中,否则您最终会因为窗口聚合而对整个表进行 seq 扫描。

于 2011-05-16T15:02:36.817 回答
0

以下解决方案的基本思想是从这个答案中借来的。

WITH data (DeviceID, Timestamp, Value) AS (
  SELECT 'Device1', CAST('1.1.2011 10:00:00' AS datetime), 3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:01',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:02',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:04',              3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:05',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:14',              8 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:15',              7 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:17',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:18',              2
),
ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Timestamp)
  FROM data
),
starts AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn + 1
      AND r1.Timestamp <= DATEADD(second, 5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
),
ends AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn - 1
      AND r1.Timestamp >= DATEADD(second, -5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
)
SELECT
  s.DeviceID,
  Started = s.Timestamp,
  Ended = e.Timestamp
FROM starts s
  INNER JOIN ends e ON s.DeviceID = e.DeviceID AND s.rank = e.rank
于 2011-05-17T15:01:56.020 回答