1

我谦虚地请求您帮助我处理以下场景,在我的项目中,我必须处理表格中的时间序列数据。我们正在使用 Azure SQL Server。

该表dbo.batch_events有14亿行,表结构和样本数据请看下面的截图:

源表结构和样本数据

我必须将表中的equipment_name列中的设备名称dbo.batch_events转换为列名,并将转换后的值加载到dbo.time_series表中。

如果我旋转设备名称列,我将在dbo.time_series表中创建 685 列。

请参阅下面的屏幕截图,其中显示dbo.time_series了源表中显示的示例数据的目标表 ( ) 结构和预期输出。

目标表结构和样本数据

请告知用 SQL 编写查询的最佳方法和方法是什么。

我编写的查询需要 25 个小时来处理 14 亿条记录并将它们加载到目标表中。

我在源表 ( ) 的 time_stamp 列上创建了按天分区,dbo.batch_events并创建了两个非聚集索引 - 一个在设备名称上,另一个在时间戳列上。

我谦虚地请求您建议我为这种情况编写查询的最佳方法。

我创建的存储过程一次处理一个月的数据;一个月内,我们有大约 1.2 亿行要处理。

  1. 通过从表中获取最小和最大日期,使用While循环将每个月的条目放入表中。所以,我在这个表中有 12 个条目,每个条目代表一个月。start_dateend_datedbo.Iteration_ctrldbo.Batch_events

  2. 使用while循环遍历表中的 12start_dateend_date条目dbo.Iteration_ctrl,并在 while 循环中使用数据透视查询将数据加载到dbo.Time_series表中。

请参阅我编写的存储过程,这需要 25 小时(在我看来效率低下)。任何帮助将不胜感激。

DECLARE @MIN_TIME DATETIME, @MIN_TIMESTAMP  DATETIME;
DECLARE @MAX_TIME DATETIME, @MAX_TIMESTAMP  DATETIME;
DECLARE @DATE DATETIME, @ROWCOUNT INT, @TOTALCOUNT INT;

SELECT @MIN_TIME = MIN(Time_stamp)  
FROM [dbo].[BATCH_EVENTS] 

SELECT @MAX_TIME = MAX(Time_stamp)  
FROM [dbo].[BATCH_EVENTS] 

PRINT 'INSERT INTO  TABLE  [dbo].[ITERATION_CTRL] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30)) 

WHILE @MIN_TIME < @MAX_TIME
BEGIN
    SELECT @DATE = DATEADD(MM, 01, @MIN_TIME)             
    SELECT @DATE = CASE WHEN @DATE > @MAX_TIME THEN @MAX_TIME ELSE @DATE END            

    INSERT INTO dbo.ITERATION_CTRL
        SELECT @MIN_TIME, @DATE

    PRINT 'INSERTION INTO TABLE  [dbo].[ITERATION_CTRL] HAS ENDED FOR'+ CAST(@MIN_TIME AS nvarchar(30)) + ' -' + CAST( @DATE AS nvarchar(30)) +' NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' ' + CAST(GETDATE() AS nvarchar(30))      

    SELECT @MIN_TIME = DATEADD(SS, 01, @DATE)   
END

PRINT 'INSERT INTO  TABLE  [dbo].[Time_series_data] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))              

SELECT @TOTALCOUNT = COUNT(*) FROM dbo.ITERATION_CTRL 
SELECT @ROWCOUNT = 1

WHILE @ROWCOUNT <= @TOTALCOUNT
BEGIN
    SELECT 
        @MIN_TIMESTAMP = MIN_DATE, 
        @MAX_TIMESTAMP = MAX_DATE  
    FROM dbo.ITERATION_CTRL 
    WHERE ID = @ROWCOUNT

    BEGIN TRANSACTION
       INSERT INTO dbo.Time_series_data
           SELECT *
           FROM 
               (SELECT 
                    [Event_name], [Time_Stamp],
                    [Start_time], [End_time], [Duration],
                    [Value] AS [Sensor_Value],
                    Equipment_name
                FROM 
                    [dbo].[BATCH_EVENTS] BE
                WHERE 
                    Time_stamp >= [Start_time] AND Time_stamp <= [End_time]
                    AND Time_stamp BETWEEN @MIN_TIMESTAMP AND @MAX_TIMESTAMP) t
           PIVOT 
               (MAX([Sensor_Value])
                    FOR Equipment_Name IN ([MY1102], [MY1138], [MY1180],
                                           [MY1164], [MY1176], [MY204],
                                           [MY324], [MY64B6])
          ORDER BY 
              [Time_Stamp], [Event_name]
  
        COMMIT TRANSACTION

        SELECT @ROWCOUNT = @ROWCOUNT + 1

        --PRINT @MIN_TIMESTAMP, @MAX_TIMESTAMP
        PRINT 'INSERTION INTO TABLE   [Time_series_data] HAS ENDED NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' For duration ' + CAST( @MIN_TIMESTAMP AS nvarchar(30))+ '   '+  CAST( @MAX_TIMESTAMP AS nvarchar(30))+' Time '+ CAST(GETDATE() AS nvarchar(30));
    END
END
4

0 回答 0