0

我有 40 个如下所示的表,每个表包含 3000 万条记录。

RawData:PK( CaregoryID, Time)

CategoryID  Time                        IsSampled  Value
-----------------------------------------------------------
1           2012-07-01 00:00:00.000     0 -> 1     65.36347
1           2012-07-01 00:00:11.000     0          80.16729
1           2012-07-01 00:00:14.000     0          29.19716
1           2012-07-01 00:00:25.000     0 -> 1      7.05847
1           2012-07-01 00:00:36.000     0 -> 1     98.08257
1           2012-07-01 00:00:57.000     0          75.35524
1           2012-07-01 00:00:59.000     0          35.35524

截至目前,IsSampled所有记录的列均为 0。
我需要更新记录,以便对于每个 CategoryID 和每个分钟范围,具有 Max(Value)、Min(Value) 和第一条记录的记录应该为 1 IsSampled

以下是我创建的程序查询,但运行时间太长。(每桌约 2h 30m)

DECLARE @startRange datetime 
DECLARE @endRange datetime 
DECLARE @endTime datetime 
SET @startRange = '2012-07-01 00:00:00.000'
SET @endTime = '2012-08-01 00:00:00.000'

WHILE (@startRange < @endTime)
BEGIN
    SET @endRange = DATEADD(MI, 1, @startRange)

    UPDATE r1
    SET IsSampled = 1
    FROM RawData AS r1
    JOIN 
    (
      SELECT r2.CategoryID, 
             MAX(Value) as MaxValue, 
             MIN(Value) as MinValue, 
             MIN([Time]) AS FirstTime
      FROM RawData AS r2
      WHERE @startRange <= [Time] AND [Time] < @endRange
      GROUP BY CategoryID
    ) as samples
    ON r1.CategoryID = samples.CategoryID
       AND (r1.Value = samples.MaxValue 
            OR r1.Value = samples.MinValue 
            OR r1.[Time] = samples.FirstTime)
       AND @startRange <= r1.[Time] AND r1.[Time] < @endRange

    SET @startRange = DATEADD(MI, 1, @startRange)   
END    

有没有办法更快地更新这些表(大概以非程序方式)?谢谢!

4

4 回答 4

1

我不确定它的性能如何,但它是一种比您当前的方法更基于集合的方法:

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)

;with BinnedValues as (
    select CategoryID,Time,IsSampled,Value,DATEADD(minute,DATEDIFF(minute,0,Time),0) as TimeBin
    from @T
), MinMax as (
    select CategoryID,Time,IsSampled,Value,TimeBin,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value) as MinPos,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value desc) as MaxPos,
        ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Time) as Earliest
    from
        BinnedValues
)
update MinMax set IsSampled = 1 where MinPos=1 or MaxPos=1 or Earliest=1

select * from @T

结果:

CategoryID  Time                   IsSampled Value
----------- ---------------------- --------- ---------------------------------------
1           2012-07-01 00:00:00.00 1         65.36347
1           2012-07-01 00:00:11.00 0         80.16729
1           2012-07-01 00:00:14.00 0         29.19716
1           2012-07-01 00:00:25.00 1         7.05847
1           2012-07-01 00:00:36.00 1         98.08257
1           2012-07-01 00:00:57.00 0         75.35524
1           2012-07-01 00:00:59.00 0         35.35524

TimeBin如果该列可以作为计算列添加到表中并添加到适当的索引中,则可能会加快速度。

还应该注意的是,这会将最多3 行标记为采样 - 如果最早的也是最小值或最大值,它只会被标记一次(显然),但下一个最近的最小值或最大值不会被标记。此外,如果多行具有相同的Value,即最小值或最大值,则将任意选择其中一行。

于 2012-07-12T06:08:27.223 回答
1

您可以将循环中的更新重写为:

   UPDATE r1
   SET   IsSampled = 1
   FROM  RawData r1
   WHERE r1.Time >= @startRange and Time < @endRange

   AND NOT EXISTS
    (
        select *
        from    RawData r2
        where   r2.CategoryID = r1.CategoryID
        and     r2.Time >= @startRange and r2.Time < @endRange 
        and     (r2.Time < r1.Time or r2.Value < r1.Value or r2.Value > r1.Value)
    )

要获得实际的性能改进,您需要在 Time 列上建立索引。

于 2012-07-12T06:08:55.970 回答
0

嗨试试这个查询。

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)


;WITH CTE as (SELECT CategoryID,CAST([Time] as Time) as time,IsSampled,Value FROM @T)

,CTE2 as (SELECT CategoryID,Max(time) mx,MIN(time) mn,'00:00:00.0000000' as start FROM CTE where time <> '00:00:00.0000000' Group by CategoryID)

update @T SET IsSampled=1
FROM CTE2 c inner join @T t on c.CategoryID = t.CategoryID and (CAST(t.[Time] as Time)=c.mx or CAST(t.[Time] as Time)=c.mn or CAST(t.[Time] as Time)=c.start)

select * from @T
于 2012-07-12T09:32:31.493 回答
0

嗨,这是最新更新的查询。检查查询的性能:

declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)


;WITH CTE as (SELECT CategoryID,Time,CAST([Time] as Time) as timepart,IsSampled,Value FROM @T)
--SELECT * FROM CTE

,CTE2 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart <> '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

,CTE3 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE 
where timepart = '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
 Group by CategoryID)

update @T SET IsSampled=1
FROM @T t left join CTE2 c1
on (t.CategoryID = c1.CategoryID  and (t.Value = c1.mn or t.Value =c1.mx))
left join CTE3 c3 on(t.CategoryID = c3.CategoryID and t.Value = c3.mx)
where (c1.CategoryID is not null or c3.CategoryID is not null)


select * from @T
于 2012-07-12T10:34:46.807 回答