问题来自真实环境,其中production_plan
表格捕获每行中的订单标识和其他详细信息。在开始生产产品时和生产后更新每一行——以捕获事件的 UTC 时间。
有一个单独的表格temperatures
收集生产线上的多个温度 - 定期,独立于任何东西,与 UTC 一起存储。
目标是提取每个产品生产的测量温度序列。(然后应处理温度,创建值图表并将其附加到产品项目文档中以供审核。)
在 marc_s 评论后更新。最初的问题没有考虑任何索引。更新后的文本考虑了以下内容。评论中提到的原始测量。
表和索引是通过以下方式创建的:
CREATE TABLE production_plan (
order_id nvarchar(50) NOT NULL,
production_line uniqueidentifier NULL,
prod_start DATETIME NULL,
prod_end DATETIME NULL
);
-- About 31 000 rows inserted, ordered by order_id.
...
-- Clusteded index on ind_order_id.
CREATE CLUSTERED INDEX ind_order_id
ON production_plan (order_id ASC);
-- Non-clustered indices on the other columns.
CREATE INDEX ind_times
ON production_plan (production_line ASC, prod_start ASC, prod_end ASC);
------------------------------------------------------
-- There is actually more temperatures for one time (i.e. more
-- sensors). The UTC is the real time of the row insertion, hence
-- the primary key.
CREATE TABLE temperatures (
UTC datetime PRIMARY KEY NOT NULL,
production_line uniqueidentifier NULL,
temperature_1 float NULL
);
-- About 91 000 rows inserted ordered by UTC.
...
-- Clusteded index on UTC is created automatically
-- because of the PRIMARY KEY. Indices on temperature(s)
-- do not make sense.
-- Non-clustered index for production_line
CREATE INDEX ind_pl
ON temperatures (production_line ASC);
-- The tables were created, records inserted, and the indices
-- created for less than 1 second (for the sample on my computer).
想法是首先在标识上加入表格production_line
,其次是温度 UTC 时间适合项目生产开始/结束的 UTC 时间:
-- About 45 000 rows in about 24 seconds when no indices were used.
-- The same took less than one second with the indices (for my data
-- and my computer).
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table02
FROM production_plan AS pp
JOIN temperatures AS t
ON pp.production_line = t.production_line
AND t.UTC BETWEEN pp.prod_start
AND pp.prod_end
ORDER BY t.UTC;
大约 24 秒的时间是不可接受的。很明显,索引是必要的。相同的操作用时不到 1 秒(Microsoft SQL Management Studio 中结果选项卡下方黄线中的时间)。
然而...
第二个问题依然存在
由于温度测量不是太频繁,而且测量位置在开始生产的时间上略有偏移,因此必须进行时间校正。换言之,必须将两个偏移量添加到时间范围边界。我以这样的查询结束:
-- About 46 000 rows in about 9 minutes without indices.
-- It took about the same also with indices
-- (8:50 instead of 9:00 or so).
DECLARE @offset_start INT;
SET @offset_start = -60 -- one minute = one sample before
DECLARE @offset_end INT;
SET @offset_end = +60 -- one minute = one sample after
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table03
FROM production_plan AS pp
JOIN temperatures AS t
ON pp.production_line = t.production_line
AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
AND DATEADD(second, @offset_end, pp.prod_end)
ORDER BY t.UTC;
通过DATEADD()
计算,大约需要 9 分钟 - 几乎与是否创建索引无关。
更多地考虑如何解决问题,在我看来,更正的时间边界(带有附加偏移的 UTC)需要它们自己的索引来进行有效处理。我想到了创建一个临时表。然后可以为其更正的列创建索引。之后再使用一个 JOIN 应该会有所帮助。然后可以删除表。
临时表的基本思想是否正确?有没有其他技术可以做到这一点?
感谢您的建议。引入您建议的索引后,我将更新时间结果。请解释预期改进的原因。在编写 SQL 解决方案时,我是关于动手经验的初学者。