我谦虚地请求您帮助我处理以下场景,在我的项目中,我必须处理表格中的时间序列数据。我们正在使用 Azure SQL Server。
该表dbo.batch_events
有14亿行,表结构和样本数据请看下面的截图:
我必须将表中的equipment_name
列中的设备名称dbo.batch_events
转换为列名,并将转换后的值加载到dbo.time_series
表中。
如果我旋转设备名称列,我将在dbo.time_series
表中创建 685 列。
请参阅下面的屏幕截图,其中显示dbo.time_series
了源表中显示的示例数据的目标表 ( ) 结构和预期输出。
请告知用 SQL 编写查询的最佳方法和方法是什么。
我编写的查询需要 25 个小时来处理 14 亿条记录并将它们加载到目标表中。
我在源表 ( ) 的 time_stamp 列上创建了按天分区,dbo.batch_events
并创建了两个非聚集索引 - 一个在设备名称上,另一个在时间戳列上。
我谦虚地请求您建议我为这种情况编写查询的最佳方法。
我创建的存储过程一次处理一个月的数据;一个月内,我们有大约 1.2 亿行要处理。
通过从表中获取最小和最大日期,使用
While
循环将每个月的条目放入表中。所以,我在这个表中有 12 个条目,每个条目代表一个月。start_date
end_date
dbo.Iteration_ctrl
dbo.Batch_events
使用
while
循环遍历表中的 12start_date
和end_date
条目dbo.Iteration_ctrl
,并在 while 循环中使用数据透视查询将数据加载到dbo.Time_series
表中。
请参阅我编写的存储过程,这需要 25 小时(在我看来效率低下)。任何帮助将不胜感激。
DECLARE @MIN_TIME DATETIME, @MIN_TIMESTAMP DATETIME;
DECLARE @MAX_TIME DATETIME, @MAX_TIMESTAMP DATETIME;
DECLARE @DATE DATETIME, @ROWCOUNT INT, @TOTALCOUNT INT;
SELECT @MIN_TIME = MIN(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
SELECT @MAX_TIME = MAX(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
PRINT 'INSERT INTO TABLE [dbo].[ITERATION_CTRL] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
WHILE @MIN_TIME < @MAX_TIME
BEGIN
SELECT @DATE = DATEADD(MM, 01, @MIN_TIME)
SELECT @DATE = CASE WHEN @DATE > @MAX_TIME THEN @MAX_TIME ELSE @DATE END
INSERT INTO dbo.ITERATION_CTRL
SELECT @MIN_TIME, @DATE
PRINT 'INSERTION INTO TABLE [dbo].[ITERATION_CTRL] HAS ENDED FOR'+ CAST(@MIN_TIME AS nvarchar(30)) + ' -' + CAST( @DATE AS nvarchar(30)) +' NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' ' + CAST(GETDATE() AS nvarchar(30))
SELECT @MIN_TIME = DATEADD(SS, 01, @DATE)
END
PRINT 'INSERT INTO TABLE [dbo].[Time_series_data] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
SELECT @TOTALCOUNT = COUNT(*) FROM dbo.ITERATION_CTRL
SELECT @ROWCOUNT = 1
WHILE @ROWCOUNT <= @TOTALCOUNT
BEGIN
SELECT
@MIN_TIMESTAMP = MIN_DATE,
@MAX_TIMESTAMP = MAX_DATE
FROM dbo.ITERATION_CTRL
WHERE ID = @ROWCOUNT
BEGIN TRANSACTION
INSERT INTO dbo.Time_series_data
SELECT *
FROM
(SELECT
[Event_name], [Time_Stamp],
[Start_time], [End_time], [Duration],
[Value] AS [Sensor_Value],
Equipment_name
FROM
[dbo].[BATCH_EVENTS] BE
WHERE
Time_stamp >= [Start_time] AND Time_stamp <= [End_time]
AND Time_stamp BETWEEN @MIN_TIMESTAMP AND @MAX_TIMESTAMP) t
PIVOT
(MAX([Sensor_Value])
FOR Equipment_Name IN ([MY1102], [MY1138], [MY1180],
[MY1164], [MY1176], [MY204],
[MY324], [MY64B6])
ORDER BY
[Time_Stamp], [Event_name]
COMMIT TRANSACTION
SELECT @ROWCOUNT = @ROWCOUNT + 1
--PRINT @MIN_TIMESTAMP, @MAX_TIMESTAMP
PRINT 'INSERTION INTO TABLE [Time_series_data] HAS ENDED NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' For duration ' + CAST( @MIN_TIMESTAMP AS nvarchar(30))+ ' '+ CAST( @MAX_TIMESTAMP AS nvarchar(30))+' Time '+ CAST(GETDATE() AS nvarchar(30));
END
END