1

因此,几周前,我询问了与 FIRST_ROWS(n) 提示相关的Oracle 执行计划成本与速度的关系。我遇到了类似的问题,但这次是围绕 ORDERED 提示。当我使用提示时,我的执行时间显着提高(超过 90%),但是查询的 EXPLAIN PLAN 报告了巨大的成本增加。在这个特定查询中,成本从 1500 到 24000。

该查询为分页参数化,并连接 19 个表以获取数据。我会把它贴在这里,但它有 585 行长,是为供应商的凌乱、可怕的架构而编写的。除非您碰巧非常熟悉它所使用的产品,否则看到它不会有太大帮助。但是,在开始调整查询之前不久,我收集了 100% 的模式统计信息,因此 CBO 并没有在黑暗中工作。

我将尝试总结查询的作用。该查询本质上返回系统中的对象及其子对象,并且结构为直接连接到多个表的大型子查询块。第一部分返回对象 ID 并在其查询块内进行分页,然后再连接到其他表。然后,它会连接到几个包含子 ID 的表。

我知道 CBO 并非无所不知或无懈可击,但看到如此昂贵的执行计划执行得如此出色,我真的很困扰;这与我所学的很多东西背道而驰。使用 FIRST_ROWS 提示,解决方案是提供一个值 n 以便优化器能够可靠地生成执行计划。我的查询的 ORDERED 提示是否发生了类似的事情?

4

2 回答 2

2

您不应依赖执行成本来优化查询。重要的是执行时间(在某些情况下是资源使用情况)。

概念指南

成本是与使用特定计划执行语句所需的预期资源使用成比例的估计值。

当估计关闭时,通常是因为优化器可用的统计信息具有误导性。您可以通过为优化器提供更准确的统计信息来纠正此问题。检查统计信息是否是最新的。如果是这样,您可以收集其他统计信息,例如通过启用在数据倾斜列上手动创建直方图的动态统计信息收集。

另一个可以解释相对成本和执行时间之间差异的因素是优化器建立在简单的假设之上。例如:

  • 没有直方图,列中的每个值都是均匀分布的
  • 相等运算符将选择 5% 的行(没有直方图或动态统计信息)
  • 每列中的数据独立于其他每列中的数据
  • Furthermore, for queries with bind variables, a single cost is computed for further executions (even if the bind value change, possibly modifying the cardinality of the query)
  • ...

These assumptions are made so that the optimizer can return an execution cost that is a single figure (and not an interval). For most queries these approximation don't matter much and the result is good enough.

However, you may find that sometimes the situation is simply too complex for the optimizer and even gathering extra statistics doesn't help. In that case you'll have to manually optimize the query, either by adding hints yourself, by rewriting the query or by using Oracle tools (such as SQL profiles).

If Oracle could devise a way to accurately determine the execution cost, we would never need to optimize a query manually in the first place !

于 2013-05-22T15:43:42.070 回答
2

The reported cost is for the execution of the complete query, not just the first set of rows. (PostgreSQL does the costing slightly differently, in that it provides the cost for the initial return of rows and for the complete set).

For some plans the majority of the cost is incurred prior to returning the first rows (eg where a sort-merge is used), and for others the initial cost is very low but the cost per row is relatively high thereafter (eg. nested loop join).

So if you are optimising for the return of the first few rows and joining 19 tables you may get a very low cost for the return of the first 20 with a nested loop-based plan. However for of the complete set of rows the cost of that plan might be very much higher than others that are optimised for returning all rows at the expense of a delay in returning the first.

于 2013-05-22T15:56:08.353 回答