sql - Optimizing sql query with subselect list in clause

Question

I'm using oracle 11g and trying to optimize a query.

The basic structure of the query is:

SELECT val1, val2, val3,
FROM 
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for 
    val1 from another table) 
and val5>=X and val5<=Y
group by val1
order by val2 desc;

My issue is that when I use a subselect, the cost is 3130. If I fill in the results of the subselect by hand - so, for example

field1 in (1, 2, 3, 4, 5, 6)

Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.

My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.

I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.

Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.

The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.

Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.

score 4 · Accepted Answer

您所看到的是 IN () bieng 受绑定变量窥视的结果。当您有直方图时，您编写一个查询，例如“where a = 'a'”，oracle 将使用直方图来猜测将返回多少行（与 inlist 运算符相同的想法，它对每个项目进行迭代并聚合行）。如果没有直方图，它将以行/不同值的形式进行猜测。在子查询中，oracle 不会这样做（在大多数情况下......它有一个独特的情况）。

例如：

SQL> create table test
  2  (val1 number, val2 varchar2(20), val3 number);

Table created.

Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;

1000 rows created.

Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;

100000 rows created.

所以我有一张有 101200 行的表。对于 VAL1 ，100 是“1”，1000 是“2”，100 是“3”，100k 是“4”。

现在如果收集直方图（在这种情况下我们确实想要它们）

SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);

SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);

我们看到以下内容：

SQL> explain plan for select * from test where val1 in (1, 2, 3) ;

Explained.

SQL> @explain ""

Plan hash value: 3165434153

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1200 | 19200 |    23   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST  |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | TEST1 |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

对比

SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain ""

Plan hash value: 441162525

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | TEST    | 25300 |   395K|   105   (3)| 00:00:02 |
----------------------------------------------------------------------------------------

查找表在哪里

SQL> select * From lookup;

        ID STR
---------- ----------
         1 A
         2 B
         3 C
         4 D

（str 是唯一索引并具有直方图）。

注意 inlist 的基数为 1200 和一个好的计划，但在子查询中却非常不准确？Oracle 没有计算连接条件的直方图，而是说“看，我不知道 id 会是什么，所以我猜总行数（100k+1000+100+100）/不同值（4）=25300 并使用那。因此，它选择了全表扫描。

这一切都很好，但如何解决呢？如果您知道此子查询将匹配少量行（我们会）。那么你必须提示外部查询以尝试让它使用索引。喜欢：

SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain

Plan hash value: 702117913

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TEST    | 25300 |   395K|   455   (1)| 00:00:06 |
|*  5 |    INDEX RANGE SCAN          | TEST1   | 25300 |       |    61   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

另一件事是在我的特殊情况下。由于 val1=4 是表的大部分，假设我有我的标准查询： select * from test t where val1 in (select id from lookup where str = :B1);

对于可能的:B1输入。如果我知道传入的有效值是 A、B 和 C（即不是映射到 id=4 的 D）。我可以添加这个技巧：

SQL> explain plan for select  * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;

Explained.

SQL> @explain ""

Plan hash value: 771376936

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  1 |  HASH JOIN                    |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  2 |   VIEW                        | index$_join$_002 |     1 |     5 |     1 (100)| 00:00:01 |
|*  3 |    HASH JOIN                  |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN          | LOOKUP1          |     1 |     5 |     0   (0)| 00:00:01 |
|   5 |     INLIST ITERATOR           |                  |       |       |            |          |
|*  6 |      INDEX UNIQUE SCAN        | SYS_C002917051   |     1 |     5 |     0   (0)| 00:00:01 |
|   7 |   INLIST ITERATOR             |                  |       |       |            |          |
|   8 |    TABLE ACCESS BY INDEX ROWID| TEST             |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  9 |     INDEX RANGE SCAN          | TEST1            |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

现在注意 oracle 有一张合理的卡片（它把 1,2,3 推到 TEST 表上并得到 1200..不是 100% 准确，因为我只过滤了它们中的任何一个，但我告诉 orce 肯定不是 4！

score 2 · Accepted Answer

我做了一些研究，我认为这里解释了一切：oracle docs。
只需查看“CBO 如何评估 IN-List 迭代器”并将其与“CBO 如何评估 IN 运算符”进行比较。

您使用“field1 in (1, 2, 3, 4, 5, 6)”的查询与第一种情况匹配，但使用 subselect 的查询由 Oracle 重写。

因此，每个带有 subselect 或 join 的查询都将具有与您相似的成本，除非您找到非常棘手的方法将子查询的返回作为参数。

您始终可以尝试为排序设置更多内存。

score 1 · Accepted Answer

您可以通过在子选择上添加索引来修复该语句。但是，您必须发布查询和执行计划才能理解这一点。顺便问一下，子选择本身需要多长时间？

您可以尝试以下两个版本之一：

select val1, val2, val3
from table_name join
     (select distinct val from (subselect here)) t
     on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;

或者：

select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
      exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;

这些在语义上是等效的，其中之一可能会更好地优化。

另一种可能可行的方法是在分组后进行过滤。就像是：

select t.*
from (select val1, val2, val3
      from table_name
      where val5>=X and val5<=Y and
      group by val1, val2, val3
     ) t
where val1 in (subselect here)
order by val2 desc;

sql - Optimizing sql query with subselect list in clause

3 回答 3

Related

Reference