我有 3 个版本的查询最终返回相同的结果。
当向相对较小的表中添加额外的内部连接并且在 where 子句中使用参数变量时,其中一个会变得非常慢。
快速查询和慢速查询(包含在每个查询下方)的执行计划非常不同。
我想了解为什么会发生这种情况以及如何预防。
此查询需要 < 1 秒。它没有额外的内部连接,但它在 where 子句中使用参数变量。
declare @start datetime = '20120115'
declare @end datetime = '20120116'
select distinct sups.campaignid
from tblSupporterMainDetails sups
inner join tblCallLogs calls on sups.supporterid = calls.supporterid
where calls.callEnd between @start and @end
|--Parallelism(Gather Streams)
|--Sort(DISTINCT ORDER BY:([sups].[campaignID] ASC))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
|--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
|--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
|--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1004]))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
| |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
|--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), WHERE:(PROBE([Bitmap1004],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))
此查询需要 < 1 秒。它有一个额外的内部连接,但在 where 子句中使用参数常量。
select distinct camps.campaignid
from tblCampaigns camps
inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
inner join tblCallLogs calls on sups.supporterid = calls.supporterid
where calls.callEnd between '20120115' and '20120116'
|--Parallelism(Gather Streams)
|--Hash Match(Right Semi Join, HASH:([sups].[campaignID])=([camps].[campaignID]))
|--Bitmap(HASH:([sups].[campaignID]), DEFINE:([Bitmap1007]))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
| |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
| |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
| |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1006]))
| | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
| | |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
| |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), WHERE:(PROBE([Bitmap1006],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([camps].[campaignID]))
|--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]), WHERE:(PROBE([Bitmap1007],[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID],N'[IN ROW]')))
此查询需要 2 分钟。它有一个额外的内部连接并且它在 where 子句中使用参数变量。
declare @start datetime = '20120115'
declare @end datetime = '20120116'
select distinct camps.campaignid
from tblCampaigns camps
inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
inner join tblCallLogs calls on sups.supporterid = calls.supporterid
where calls.callEnd between @start and @end
|--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
|--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
|--Top(TOP EXPRESSION:((1)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([calls].[callID], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
| |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
| |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterID_closingCall] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[AUTOGEN_PK_tblCallLogs] AS [calls]), SEEK:([calls].[callID]=[GOGEN].[dbo].[tblCallLogs].[callID] as [calls].[callID]), WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) LOOKUP ORDERED FORWARD)
笔记:
- 我相信缓慢是由 Clustered Index Seek on 引起的
tblCallLogs
,但是我不知道为什么 SQL Server 会选择这个执行计划。 - 我应该使用查询优化器提示吗?我需要并且不愿意告诉 SQL Server 如何完成它的工作......
- 该问题似乎是由多种因素引起的 - 额外的连接和变量。
- 执行计划在为查询找到变量时是否会尝试重用“坏”计划?
- 在现实生活中,我将不得不使用参数变量。常数不好!所以这个问题可能存在于我的许多查询/存储过程中!
- 我已经在 和 上重建了索引并更新了统计
tblCampaigns
数据tblSupporterMainDetails
。这没有效果。 - 两个表在主键(标识整数)上都有聚集索引。
- 外键列
campaignid
被索引。 - 所有查询都使用相同的参数值——不管它被用作变量还是常量。
表中的记录数:
- tblSupporterMainDetails = 12,561,900
- tblCallLogs = 27,242,224
- tblCampaigns = 756
更新:
- 我还重建了索引并更新了
tblcalllogs
. 没有效果。 - 我已经使用清除了执行计划缓存
DBCC FREEPROCCACHE
- tblCallLogs.callEnd 是一个日期时间。
涉及列的架构:
tblCampaign.campaignid int not null
tblSupporterMainDetails.campaignid int not null
tblSupporterMainDetails.supporterid int not null
tblCallLogs.supporterid int not null
tblCallLogs.callEnd datetime not null
索引:
更新 2:将索引添加到 tblCallLogs.supporterId 后 - 包含列: callEnd
“慢”查询加速到 40 秒。更新的执行计划:
|--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
|--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
|--Top(TOP EXPRESSION:((1)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
|--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
|--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterid_callend] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]), WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) ORDERED FORWARD)
解决方案:
额外的连接实际上并没有直接导致问题,但它显然改变了语句,以便 sql server 为其持有不同的执行计划。
通过将 OPTION(RECOMPILE) 添加到慢语句的末尾,我能够获得预期的快速性能。即<1秒。我仍然不确定这个解决方案是否有效 - 为什么不刷新所有计划工作?这是参数嗅探的经典案例吗?当我知道确切的答案时,我会更新这篇文章 - 或者直到有人能给出明确的答案。感谢@LievenKeersmaekers 和@JNK 到目前为止提供的帮助......