简介和背景
我不得不优化一个简单的查询(下面的例子)。在重写了几次之后,我认识到一个相同索引操作的估计行数会因查询的编写方式而异。
最初该查询进行了聚集索引扫描,因为生产中的表包含一个二进制列,该表非常大(大约 100 GB)并且执行全表扫描需要太多时间。
问题
为什么同一索引操作的估计行数不同(示例将显示)?优化器在这里做什么?
示例数据库 - 我使用的是 SQL Server 2008 R2
我试图创建一个非常简单的生产表版本来显示行为。
-- CREATE THE SAMPLE TABLES
----------------------------
CREATE TABLE dbo.MasterTable(
MasterId smallint NOT NULL,
Name varchar(5) NOT NULL,
CONSTRAINT PK_MasterTable PRIMARY KEY CLUSTERED (MasterId ASC)
) ON [PRIMARY]
GO
CREATE TABLE dbo.DetailTable(
DetailId bigint IDENTITY(1,1) NOT NULL,
MasterId smallint NOT NULL,
Name nvarchar(50) NOT NULL,
CreateDate datetime NOT NULL,
CONSTRAINT PK_DetailTable PRIMARY KEY CLUSTERED (DetailId ASC)
) ON [PRIMARY]
GO
ALTER TABLE dbo.DetailTable
ADD CONSTRAINT FK1
FOREIGN KEY(MasterId) REFERENCES dbo.MasterTable (MasterId)
GO
CREATE NONCLUSTERED INDEX IX_DetailTable
ON dbo.DetailTable( MasterId ASC, Name ASC )
GO
-- INSERT SOME SAMPLE DATA
----------------------------
SET NOCOUNT ON
GO
-- These are some Codes. In our system we always use these codes to search for "types" of data.
INSERT INTO dbo.MasterTable (MasterId, Name)
VALUES (1, 'N1'), (2, 'N2'), (3, 'N3'), (4, 'N4'), (5, 'N5'), (6, 'N6'), (7, 'N7'), (8, 'N8')
GO
-- ADD ROWS TO THE DETAIL TABLE
-- Takes about 1 minute to run
-- Don't care about the logic, it's just to get a distribution similar to production system
----------------------------
declare @x int = 1
DECLARE @MasterID INT
while (@x <= 400000)
begin
SET @MasterID = ABS(CHECKSUM(NEWID())) % 8 + 1
INSERT INTO dbo.DetailTable(MasterId,Name,CreateDate)
VALUES(
CASE
WHEN @MasterID IN (1, 3, 4) AND @x % 20 != 0 THEN 2
WHEN @MasterID IN (5, 6) AND @x % 20 != 0 THEN 7
WHEN @MasterID = 8 AND @x % 100 != 0 THEN 7
ELSE @MasterID
END,
NEWID(),
DATEADD(DAY, - ABS(CHECKSUM(NEWID())) % 1000, GETDATE())
)
SET @x = @x + 1
end
go
-- DO THE INDEX AND STATISTIC MAINTENANCE
----------------------------
alter index all on dbo.DetailTable reorganize
alter index all on dbo.MasterTable reorganize
update statistics dbo.DetailTable WITH FULLSCAN
update statistics dbo.MasterTable WITH FULLSCAN
go
准备工作做好了,我们开始查询
我们先来看看统计,看看RANGE_HI_KEY=8
,有 489 个 EQ_ROWS
-- CHECK THE STATISTICS
----------------------------
dbcc show_statistics ('dbo.DetailTable', IX_DetailTable)
GO
现在我们进行查询。第一个是我必须优化的原始查询。执行时请激活当前执行计划。看一下操作“index seek (nonclustered) [DetailTable].[IX_DetailTable]”
-- ORIGINAL QUERY
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- FORCESEEK
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 50.000
-- TABLE VARIABLE
----------------------------
DECLARE @MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO @MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN @MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 40.000
-- TEMP TABLE
----------------------------
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
-- Actual 489, Estimated 489
DROP TABLE #MasterId
GO
分析和最终问题
请看一下操作“index seek (nonclustered) [DetailTable].[IX_DetailTable]”
上面脚本中的注释显示了我得到的估计和实际行数的值。
在我们的生产环境中,该表有 3300 万行,上述查询中的估计行数从 300 万到 1600 万不等。
总结一下:
当在 DetailTable 和 MasterTable 之间进行连接时,估计的行数是 12.5%(主表中有 8 个值,这是有道理的,有点……)
当在 DetailTable 和表变量之间进行连接时,估计的行数为 10%
当在 DetailTable 和临时表之间进行连接时,估计的行数与实际行数完全相同
问题是为什么这些值不同?
统计数据是最新的,做出估计应该很容易。
我只是想明白这一点。