sql - 使用 DATETIME 计算的低效 SQL 查询。如何优化？

Question

问题来自真实环境，其中production_plan表格捕获每行中的订单标识和其他详细信息。在开始生产产品时和生产后更新每一行——以捕获事件的 UTC 时间。

有一个单独的表格temperatures收集生产线上的多个温度 - 定期，独立于任何东西，与 UTC 一起存储。

目标是提取每个产品生产的测量温度序列。（然后应处理温度，创建值图表并将其附加到产品项目文档中以供审核。）

在 marc_s 评论后更新。最初的问题没有考虑任何索引。更新后的文本考虑了以下内容。评论中提到的原始测量。

表和索引是通过以下方式创建的：

CREATE TABLE production_plan (
        order_id nvarchar(50) NOT NULL,
        production_line uniqueidentifier NULL,
        prod_start DATETIME NULL,
        prod_end DATETIME NULL
);

-- About 31 000 rows inserted, ordered by order_id.
...

-- Clusteded index on ind_order_id.
CREATE CLUSTERED INDEX ind_order_id
ON production_plan (order_id ASC);

-- Non-clustered indices on the other columns.
CREATE INDEX ind_times
ON production_plan (production_line ASC, prod_start ASC, prod_end ASC);

------------------------------------------------------

-- There is actually more temperatures for one time (i.e. more
-- sensors). The UTC is the real time of the row insertion, hence
-- the primary key.
CREATE TABLE temperatures (
        UTC datetime PRIMARY KEY NOT NULL,
        production_line uniqueidentifier NULL,
        temperature_1 float NULL  
);

-- About 91 000 rows inserted ordered by UTC.
...

-- Clusteded index on UTC is created automatically 
-- because of the PRIMARY KEY. Indices on temperature(s)
-- do not make sense.

-- Non-clustered index for production_line
CREATE INDEX ind_pl
ON temperatures (production_line ASC);

-- The tables were created, records inserted, and the indices
-- created for less than 1 second (for the sample on my computer).

想法是首先在标识上加入表格production_line，其次是温度 UTC 时间适合项目生产开始/结束的 UTC 时间：

-- About 45 000 rows in about 24 seconds when no indices were used.
-- The same took less than one second with the indices (for my data
-- and my computer).
SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start
                          AND pp.prod_end
  ORDER BY t.UTC;

大约 24 秒的时间是不可接受的。很明显，索引是必要的。相同的操作用时不到 1 秒（Microsoft SQL Management Studio 中结果选项卡下方黄线中的时间）。

然而...

第二个问题依然存在

由于温度测量不是太频繁，而且测量位置在开始生产的时间上略有偏移，因此必须进行时间校正。换言之，必须将两个偏移量添加到时间范围边界。我以这样的查询结束：

-- About 46 000 rows in about 9 minutes without indices.
-- It took about the same also with indices 
-- (8:50 instead of 9:00 or so).
DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table03
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                          AND DATEADD(second, @offset_end, pp.prod_end)
  ORDER BY t.UTC;

通过DATEADD()计算，大约需要 9 分钟 - 几乎与是否创建索引无关。

更多地考虑如何解决问题，在我看来，更正的时间边界（带有附加偏移的 UTC）需要它们自己的索引来进行有效处理。我想到了创建一个临时表。然后可以为其更正的列创建索引。之后再使用一个 JOIN 应该会有所帮助。然后可以删除表。

临时表的基本思想是否正确？有没有其他技术可以做到这一点？

感谢您的建议。引入您建议的索引后，我将更新时间结果。请解释预期改进的原因。在编写 SQL 解决方案时，我是关于动手经验的初学者。

score 2 · Accepted Answer

您通常可以通过以下方式优化查询：

在你的桌子上选择一个好的narrow, unique, static, ever-increasing聚类键——好存在。INT IDENTITY是经典的好键 - GUID 是一个非常糟糕的例子（因为它们会导致过多的索引碎片 - 阅读 Kim Tripp 的GUID 作为主键和/或集群键以获得更多详细信息）
确保子表中的所有外键列都被索引，以便更快地执行 JOIN 和查找
选择你真正需要的尽可能少的列（你似乎做得很好）
试图覆盖查询，例如在包含所有必要列的相关表上创建索引 - 直接作为索引列或包含列（SQL Server 2008 及更高版本）
可能添加额外的索引来加速范围查询，和/或帮助排序/排序

查看您的查询和表定义：

我似乎没有看到任何主键 - 添加这些！
你必须确保有外键索引pp.production_line（假设t.production_line是另一个表的主键）
你应该看看你是否能找到一个好的索引来处理范围查询t.UTC
production_plan2您应该检查创建索引以包含所有列 ( order_id, pp.prod_start, pp.prod_end) 是否有意义
temperatures2您应该检查创建索引以包含所有列 ( UTC, temperature_1) 是否有意义

更新：您可以通过从 SSMS 工具栏中启用该选项来捕获实际执行计划：

在此处输入图像描述

或从下面的菜单中Query > Include Actual Execution Plan

score 1 · Accepted Answer

计算列可以帮助您 http://msdn.microsoft.com/en-us/library/ms189292%28v=sql.105%29.aspx

ALTER TABLE production_plan ADD 
        offset_start int NOT NULL CONSTRAINT DF__production_plan__offset_start DEFAULT 0,
        offset_end int NOT NULL CONSTRAINT DF__production_plan__offset_end DEFAULT 0,
        prod_start_UTC as CAST(DATEADD(second,offset_start,prod_start) as DATETIME) PERSISTED  NOT NULL ,
        prod_end_UTC as CAST(DATEADD(second,offset_end,prod_end) as DATETIME) PERSISTED  NOT NULL

-- or just
--ALTER TABLE production_plan ADD 
--        prod_start_UTC as CAST(DATEADD(second,-60,prod_start) as DATETIME) PERSISTED  NOT NULL ,
--        prod_end_UTC as CAST(DATEADD(second,60,prod_end) as DATETIME) PERSISTED  NOT NULL

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[temperatures]') AND name = N'ind_pl')
    DROP INDEX [ind_pl] ON [dbo].[temperatures] WITH ( ONLINE = OFF )

CREATE INDEX ind_times_UTC
ON production_plan (production_line ASC, prod_start_UTC ASC, prod_end_UTC ASC);

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table05
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start_UTC
                          AND pp.prod_end_UTC
ORDER BY t.UTC;

以及 marc_s 提出的建议

score 1 · Accepted Answer

要尝试的事情：

CREATE INDEX ind_pl
    ON temperatures (production_line ASC, UTC);

将为连接提供覆盖索引。

使用非等连接应用（sql server 2005+）可能会更快：

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
 CROSS APPLY
 (
   SELECT t1.utc, t1.temperature_1
     FROM temperatures AS t1
    WHERE t1.production_line = pp.production_line
      AND t1.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                     AND DATEADD(second, @offset_end, pp.prod_end)
 ) t
 ORDER BY t.UTC;

如果这不起作用，下一个选项是编写存储过程，通过声明两个游标，一个用于 pp，一个用于 t，并在插入匹配项时一次推进一侧，从而确保每个表只被读取一次进入临时表。这种技术可能非常复杂，因为存在 n:m 关系。但是，如果上述方法不适合您，我很乐意试一试。

score 1 · Accepted Answer

我用临时表尝试了以下解决方案：

-- UTC range expanded by the offsets -- temporary table used.
-- (Much better -- less than one second.)

DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

-- Temporary table with the production_plan UTC range expanded.
SELECT production_line,
       order_id,
       prod_start,
       prod_end,
       DATEADD(second, @offset_start, prod_start) AS start,
       DATEADD(second, @offset_end, prod_end) AS bend
  INTO #pp     
  FROM production_plan;

CREATE INDEX ind_UTC
  ON #pp (production_line ASC, start ASC, bend ASC);

SELECT order_id,
       prod_start,
       prod_end,
       UTC,
       temperature_1
  INTO result_table06
  FROM #pp JOIN temperatures AS t
             ON #pp.production_line = t.production_line
                AND UTC BETWEEN #pp.start AND #pp.bend
  ORDER BY UTC;

DROP TABLE #pp;

CREATE CLUSTERED INDEX ind_UTC
  ON result_table06 (UTC ASC);

结果在不到一秒的时间内准备好（与 9 分钟相比）。但我想听听你的批评。一个问题是，如果温度表增长到一张大表，它的效率会有多高。

score 0 · Accepted Answer

这是你的第二个问题。

我还没有检查过它的性能，但是您可以尝试通过将 DATEADD 函数替换为常量浮点数的加减法来跳过它。

就像如果你想添加第二个你可以使用：

select getdate()+1.000/(24.00*60.00)

或者用一个常数：

select getdate()+0.000694444

如您所见，添加 1（一）将恰好增加 1 天。所以这不会完全是 60 秒，但在这种情况下可能没关系？

sql - 使用 DATETIME 计算的低效 SQL 查询。如何优化？

5 回答 5

Related

Reference