sql-server - SQL Server 如何评估包含用户定义函数的执行计划的成本？

Question

我有一个存储过程，它根据DATEADD函数的结果进行过滤 - 我的理解是，这类似于使用用户定义的函数，因为 SQL 服务器无法根据该函数的输出存储统计信息，因此难以评估执行计划。

查询看起来有点像这样：

SELECT /* Columns */ FROM
TableA JOIN TableB
ON TableA.id = TableB.join_id
WHERE DATEADD(hour, TableB.HoursDifferent, TableA.StartDate) <= @Now

（所以它不可能预先计算的结果DATEADD）

我看到的是一个可怕的执行计划，我认为这是由于 SQL 服务器错误地估计从树的一部分返回的行数为 1，而实际上它约为 65,000。然而，当数据库中存在不同（不一定更少）数据时，我已经看到相同的存储过程在很短的时间内执行。

我的问题是——在这种情况下，查询优化器如何估计函数的结果？

更新：仅供参考，我更感兴趣的是了解为什么有时我会得到一个好的执行计划，而为什么其他时间我没有 - 我已经很清楚我将如何解决这个问题在长期。

score 3 · Accepted Answer

It's not the costing of the plan that's the problem here. The function on the columns prevent SQL from doing index seeks. You're going to get an index scan or a table scan.

What I'd suggest is to see if you can get one of the columns out of the function, basically see if you can move the function to the other side of the equality. It's not perfect, but it means that at least one column can be used for an index seek.

Something like this (rough idea, not tested) with an index on TableB.HoursDifference, then an index on the join column in TableA

DATEDIFF(hour, @Now, TableA.StartDate) >= TableB.HoursDifferent

On the costing side, I suspect that the optimiser will use the 30% of the table 'thumb-suck' because it can't use statistics to get an accurate estimate and because it's an inequality. Meaning it's going to guess that 30% of the table will be returned by that predicate.

It's really hard to say anything for sure without seeing the execution plans. You mention an estimate of 1 row and an actual of 65000. In some cases, that's not a problem at all. http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

score 1 · Accepted Answer

@克拉根，

简短的回答：如果您正在使用十个表进行查询，请习惯它。你需要学习所有关于查询提示的知识，以及更多的技巧。

长答案：

SQL Server 通常只为最多大约三到五个表生成出色的查询计划。根据我的经验，一旦超越了这一点，您基本上将不得不自己编写查询计划，使用所有索引和连接提示。（此外，标量函数似乎估计为 Cost=0，这简直是疯了。）

原因是在那之后它太他妈的复杂了。查询优化器必须在算法上决定要做什么，即使是 SQL Server 团队中最聪明的天才也有太多可能的组合来创建一个真正通用的算法。

他们说优化器比你聪明。这可能是真的。但是你有一个优势。那个好处就是如果不行，可以扔掉再试一次！大约在第六次尝试时，如果您知道数据，即使是十表连接，您也应该有一些可以接受的东西。查询优化器无法做到这一点，它必须立即提出某种计划，并且没有第二次机会。

我最喜欢的技巧是通过将 where 子句转换为 case 语句来强制执行 where 子句的顺序。代替：

WHERE
predicate1
AND predicate2
AND....

用这个：

WHERE
case 
when not predicate1 then 0
when not predicate2 then 0
when not .... then 0
else 1 end = 1

从最便宜到最昂贵的谓词排序，您会得到逻辑上相同但 SQL 服务器不会乱用的结果 - 它必须按照您说的顺序执行它们。

score 1 · Accepted Answer

查看函数会有所帮助，但我看到的一件事是在查询中隐藏这样的函数会导致性能下降。如果您可以事先评估其中一些，您可能会处于更好的状态。例如，而不是

WHERE MyDate < GETDATE()

尝试

DECLARE @Today DATETIME
SET @Today = GETDATE()
...
WHERE MyDate < @Today

这似乎表现更好

sql-server - SQL Server 如何评估包含用户定义函数的执行计划的成本？

3 回答 3

Related

Reference