TLDR 版本
您显然缺少有助于此查询的索引。添加缺失的索引本身可能会导致一个数量级的改进。
如果您在 SQL Server 2012 上使用重写查询LEAD
也可以这样做(尽管这也将受益于缺少的索引)。
如果您仍在 2005/2008 年,那么您可以对现有查询进行一些改进,但与索引更改相比,效果会相对较小。
更长的版本
为此需要 3 分钟,我假设您根本没有有用的索引,最大的胜利就是简单地添加一个索引(对于每月运行一次的报告,只需将三列中的数据复制到适当索引的#temp
表中就足够了如果您不想创建永久索引)。
您说为清楚起见简化了表格,并且它有 40K 行。假设以下测试数据
CREATE TABLE TestDuration
(
Id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
VALIDATION_TIMESTAMP DATETIME,
ID_TICKET BIGINT,
ID_PLACE BIGINT,
OtherColumns CHAR(100) NULL
)
INSERT INTO TestDuration
(VALIDATION_TIMESTAMP,
ID_TICKET,
ID_PLACE)
SELECT TOP 40000 DATEADD(minute, ROW_NUMBER() OVER (ORDER BY (SELECT 0)), GETDATE()),
ABS(CHECKSUM(NEWID())) % 10,
ABS(CHECKSUM(NEWID())) % 100
FROM master..spt_values v1,
master..spt_values v2
您的原始查询在我的机器上花费了 51 秒时间MAXDOP 1
和以下 IO 统计信息
Table 'Worktable'. Scan count 79990, logical reads 1167573, physical reads 0
Table 'TestDuration'. Scan count 3, logical reads 2472, physical reads 0.
对于表中的 40,000 行中的每一行,它正在对所有匹配ID_TICKET
行进行两种排序,以便按顺序识别下一个VALIDATION_TIMESTAMP
只需添加如下索引即可将经过的时间降至 406 毫秒,提高了 100 多倍(此答案中的后续查询假设该索引现已到位)。
CREATE NONCLUSTERED INDEX IX
ON TestDuration(ID_TICKET, VALIDATION_TIMESTAMP)
INCLUDE (ID_PLACE)
现在该计划如下所示,其中 80,000 次排序和假脱机操作被索引查找所取代。
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'TestDuration'. Scan count 79991, logical reads 255707, physical reads 0
然而,它仍在为每一行进行 2 次搜索。重写CROSS APPLY
允许这些组合。
SELECT VisitDurationCalcTable.ID_PLACE AS ID_PLACE_IN,
VisitDurationCalcTable.ID_NEXT_VISIT_PLACE AS ID_PLACE_OUT,
COUNT(visitduration) AS NUMBER_OF_VISITS,
AVG(visitduration) AS AVERAGE_VISIT_DURATION
FROM (SELECT EntryData.VALIDATION_TIMESTAMP,
EntryData.ID_TICKET,
EntryData.ID_PLACE,
CA.ID_PLACE AS ID_NEXT_VISIT_PLACE,
DATEDIFF(n, EntryData.VALIDATION_TIMESTAMP, CA.VALIDATION_TIMESTAMP) AS visitduration
FROM TestDuration EntryData
CROSS APPLY (SELECT TOP 1 ID_PLACE,
VALIDATION_TIMESTAMP
FROM TestDuration
WHERE ID_TICKET = EntryData.ID_TICKET
AND VALIDATION_TIMESTAMP > EntryData.VALIDATION_TIMESTAMP
ORDER BY VALIDATION_TIMESTAMP ASC) CA) AS VisitDurationCalcTable
GROUP BY VisitDurationCalcTable.ID_PLACE,
VisitDurationCalcTable.ID_NEXT_VISIT_PLACE
这给了我 269 毫秒的经过时间
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'TestDuration'. Scan count 40001, logical reads 127988, physical reads 0
虽然读取次数仍然很高,但查找都是读取刚刚被扫描读取的页面,因此它们都是缓存中的页面。使用表变量可以减少读取次数。
DECLARE @T TABLE (
VALIDATION_TIMESTAMP DATETIME,
ID_TICKET BIGINT,
ID_PLACE BIGINT,
RN INT
PRIMARY KEY(ID_TICKET, RN) )
INSERT INTO @T
SELECT VALIDATION_TIMESTAMP,
ID_TICKET,
ID_PLACE,
ROW_NUMBER() OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP) AS RN
FROM TestDuration
SELECT T1.ID_PLACE AS ID_PLACE_IN,
T2.ID_PLACE AS ID_PLACE_OUT,
COUNT(*) AS NUMBER_OF_VISITS,
AVG(DATEDIFF(n, T1.VALIDATION_TIMESTAMP, T2.VALIDATION_TIMESTAMP)) AS AVERAGE_VISIT_DURATION
FROM @T T1
INNER MERGE JOIN @T T2
ON T1.ID_TICKET = T2.ID_TICKET
AND T2.RN = T1.RN + 1
GROUP BY T1.ID_PLACE,
T2.ID_PLACE
然而,对我来说,至少将经过的时间略微增加到 301 毫秒(插入为 43 毫秒 + 选择为 258 毫秒),但这仍然是代替创建永久索引的好选择。
(Insert)
Table 'TestDuration'. Scan count 1, logical reads 233, physical reads 0
(Select)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table '#0C50D423'. Scan count 2, logical reads 372, physical reads 0
最后,如果您使用的是 SQL Server 2012,您可以使用LEAD
( SQL Fiddle )
WITH CTE
AS (SELECT ID_PLACE AS ID_PLACE_IN,
LEAD(ID_PLACE) OVER (PARTITION BY ID_TICKET
ORDER BY VALIDATION_TIMESTAMP) AS ID_PLACE_OUT,
DATEDIFF(n,
VALIDATION_TIMESTAMP,
LEAD(VALIDATION_TIMESTAMP) OVER (PARTITION BY ID_TICKET
ORDER BY VALIDATION_TIMESTAMP)) AS VISIT_DURATION
FROM TestDuration)
SELECT ID_PLACE_IN,
ID_PLACE_OUT,
COUNT(*) AS NUMBER_OF_VISITS,
AVG(VISIT_DURATION) AS AVERAGE_VISIT_DURATION
FROM CTE
WHERE ID_PLACE_OUT IS NOT NULL
GROUP BY ID_PLACE_IN,
ID_PLACE_OUT
这给了我 249 毫秒的经过时间
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'TestDuration'. Scan count 1, logical reads 233, physical reads 0
该LEAD
版本在没有索引的情况下也表现良好。省略最佳索引会为计划增加额外SORT
的内容,这意味着它必须读取我的测试表上更广泛的聚集索引,但它仍然在 293 毫秒的经过时间内完成。
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'TestDuration'. Scan count 1, logical reads 824, physical reads 0