sql - 没有连接的 Oracle IN 子句对性能有何影响？

Question

我有一个这种形式的查询，它平均需要大约 100 个子句元素，并且在某些罕见的情况下 > 1000 个元素。如果超过 1000 个元素，我们会将 in 子句分块到 1000（Oracle 最大值）。

SQL 的形式为

SELECT * FROM tab WHERE PrimaryKeyID IN (1,2,3,4,5,...)

我从中选择的表很大，并且包含的行数将比我的 in 子句中的多几百万行。我担心优化器可能会选择进行表扫描（我们的数据库没有最新的统计信息 - 是的 - 我知道......）

有没有我可以传递的提示来强制使用主键 - 在不知道主键的索引名称的情况下，可能类似于 ... /*+ DO_NOT_TABLE_SCAN */？

是否有任何创造性的方法来拉回数据，这样

我们执行最少的往返次数
我们读取了最少数量的块（在逻辑 IO 级别？）
这会更快吗..

SELECT * FROM tab WHERE PrimaryKeyID = 1
  UNION
SELECT * FROM tab WHERE PrimaryKeyID = 2
  UNION
SELECT * FROM tab WHERE PrimaryKeyID = 2
  UNION ....

score 6 · Accepted Answer

WHERE如果表上的统计信息是准确的，那么当子句中只有 1000 个硬编码元素时，优化器应该不太可能选择进行表扫描而不是使用主键索引。最好的方法是收集（或设置）关于对象的准确统计数据，因为这应该会导致好事自动发生，而不是尝试做很多体操来解决不正确的统计数据。

If we assume that the statistics are inaccurate to the degree that the optimizer would be lead to believe that a table scan would be more efficient than using the primary key index, you could potentially add in a DYNAMIC_SAMPLING hint that would force the optimizer to gather more accurate statistics before optimizing the statement or a CARDINALITY hint to override the optimizer's default cardinality estimate. Neither of those would require knowing anything about the available indexes, it would just require knowing the table alias (or name if there is no alias). DYNAMIC_SAMPLING would be the safer, more robust approach but it would add time to the parsing step.

If you are building up a SQL statement with a variable number of hard-coded parameters in an IN clause, you're likely going to be creating performance problems for yourself by flooding your shared pool with non-sharable SQL and forcing the database to spend a lot of time hard parsing each variant separately. It would be much more efficient if you created a single sharable SQL statement that could be parsed once. Depending on where your IN clause values are coming from, that might look something like

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM global_temporary_table);

or

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM TABLE( nested_table ));

or

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM some_other_source);

If you got yourself down to a single sharable SQL statement, then in addition to avoiding the cost of constantly re-parsing the statement, you'd have a number of options for forcing a particular plan that don't involve modifying the SQL statement. Different versions of Oracle have different options for plan stability-- there are stored outlines, SQL plan management, and SQL profiles among other technologies depending on your release. You can use these to force particular plans for particular SQL statements. If you keep generating new SQL statements that have to be re-parsed, however, it becomes very difficult to use these technologies.

sql - 没有连接的 Oracle IN 子句对性能有何影响？

1 回答 1

Related

Reference