2

我很难理解为什么当绑定变量是 OR 条件的一部分时 Oracle CBO 的行为方式。

我的环境

Oracle 12.2 优于 Red Hat Linux 7

暗示。我只是提供问题所在查询的简化

$ sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> @test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> SELECT
  2  whs.* ,
  3  count(*) over () AS TOTAL
  4  FROM ALFAMVS.WHS_LOANS whs
  5  WHERE
  6  ( nvl(:loanIds,'XX') = 'XX' or
  7              loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
  8                                           connect by level <= regexp_count(:loanIds,'[^,]+'))
  9  )
 10  ;

7 rows selected.

Elapsed: 00:00:18.72

Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427

------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |           |  6729 |  6748K|  2621   (1)| 00:00:01 |
|   1 |  WINDOW BUFFER                           |           |  6729 |  6748K|  2621   (1)| 00:00:01 |
|*  2 |   FILTER                                 |           |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | WHS_LOANS |   113K|   110M|  2621   (1)| 00:00:01 |
|*  4 |    FILTER                                |           |       |       |            |          |
|*  5 |     CONNECT BY WITHOUT FILTERING (UNIQUE)|           |       |       |            |          |
|   6 |      FAST DUAL                           |           |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(:LOANIDS,'XX')='XX' OR  EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
              SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
              REGEXP_COUNT (:LOANIDS,'[^,]+')))
   4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
   5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))


Statistics
----------------------------------------------------------
        288  recursive calls
        630  db block gets
       9913  consistent gets
          1  physical reads
     118724  redo size
      13564  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
     113003  sorts (memory)
          0  sorts (disk)
          7  rows processed

SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;

  COUNT(*)
----------
    113095

1 row selected.

Elapsed: 00:00:00.14

关键点

  1. 我确实知道,如果我通过使用两个选择来更改 OR 表达式并且 UNION ALL 可以完美地工作。问题是我有很多条件以同样的方式完成,所以 UNION ALL 不是我的解决方案。
  2. 该表具有使用 FOR ALL COLUMNS SIZE AUTO 和 ESTIMATE PERCENT 10% 计算的最新统计信息。
  3. 就我而言,动态 SQL 不是解决方案,因为查询是通过第三方软件调用的,该软件使用 API Web 将结果转换为 JSON。
  4. 我能够以现在需要 19 秒的方式使用按级别连接来改写正则表达式。在它需要40秒之前。
  5. 该表只有 113K 条记录,没有索引。
  6. 该查询有 20 个此类条件,所有条件均以相同的方式编写,因为通过 API 触发查询的 Web 应用程序中的屏幕允许用户使用任何参数组合或根本不使用任何参数组合。

如果我删除表达式NVL(:loanIds,'XX') = 'XX' OR,查询需要 0.01 秒。为什么这个带有 BIND 的 OR 表达式让优化器如此头痛?

- 更新 -

我要感谢@Alex Poole 的建议,并与他分享第三种选择(删除正则表达式)的魅力。不过,如果能理解其中的原因,那就太好了。你有我最诚挚的谢意。我用了一段时间,我从来没有遇到过这个问题。此外,使用建议regexp_like甚至比使用regexp_substrand的原始建议更好connect by level,但比根本不使用正则表达式的建议要慢得多

原始查询

7 rows selected.

Elapsed: 00:00:36.29

新查询

7 rows selected.

Elapsed: 00:00:00.58

一旦EXISTS内部谓词消失,查询就会像地狱一样快。

谢谢大家的意见 !

4

1 回答 1

2

出于某种原因,优化器从执行计划中重新评估表中每一行的分层查询,然后exists()用于查看该行的 ID 是否在结果中。目前尚不清楚为什么or会导致这种情况。这可能是要向 Oracle 提出的问题。

通过实验,我可以看到至少部分解决该问题的三种方法——尽管我确信还有其他方法。第一个是将 CSV 扩展移动到 CTE,然后通过提示强制实现:

WITH loanIds_cte (loanId) as (
  select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
  from dual
  connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or
              loanid IN (select loanId from loanIds_cte)
  )
;

PLAN_TABLE_OUTPUT                                                                   
------------------------------------------------------------------------------------
Plan hash value: 3226738189
 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                             |  1102 |  9918 |    11   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION     |                             |       |       |            |          |
|   2 |   LOAD AS SELECT               | SYS_TEMP_0FD9FD2A6_198A2E1A |       |       |            |          |
|*  3 |    CONNECT BY WITHOUT FILTERING|                             |       |       |            |          |
|   4 |     FAST DUAL                  |                             |     1 |       |     2   (0)| 00:00:01 |
|   5 |   WINDOW BUFFER                |                             |  1102 |  9918 |     9   (0)| 00:00:01 |
|*  6 |    FILTER                      |                             |       |       |            |          |
|   7 |     TABLE ACCESS FULL          | WHS_LOANS                   | 11300 |    99K|     9   (0)| 00:00:01 |
|*  8 |     VIEW                       |                             |     1 |  2002 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL         | SYS_TEMP_0FD9FD2A6_198A2E1A |     1 |  2002 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
   6 - filter(:LOANIDS IS NULL OR  EXISTS (SELECT 0 FROM  (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0" 
              "LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
   8 - filter(SYS_OP_C2C("LOANID")=:B1)

这仍然会对 进行奇怪的转换exists(),但至少现在正在查询物化 CTE,因此该connect by查询仅被评估一个。

或者您可以loadId使用正则表达式将每个值与完整字符串进行比较:

SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or 
    regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
  )
;

PLAN_TABLE_OUTPUT                                                                   
------------------------------------------------------------------------------------
Plan hash value: 1622376598
 
--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |  1102 |  9918 |     9   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER     |           |  1102 |  9918 |     9   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| WHS_LOANS |  1102 |  9918 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(:LOANIDS IS NULL OR  REGEXP_LIKE 
              (:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))

在我的测试中,它比 CTE 慢,因为正则表达式仍然很昂贵,而且你正在做 113k 个(仍然比 2 x 113k x 它们的元素数好)。

或者您可以避免使用正则表达式并使用几个单独的比较:

SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or 
    :loanIds like loanId || ',%' or
    :loanIds like '%,' || loanId or
    :loanIds like '%,' || loanId || ',%'
  )
;

PLAN_TABLE_OUTPUT                                                                    
------------------------------------------------------------------------------------
Plan hash value: 1622376598
 
--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |  2096 | 18864 |     9   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER     |           |  2096 | 18864 |     9   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| WHS_LOANS |  2096 | 18864 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE 
              SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE 
              SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE 
              SYS_OP_C2C(U'%,'||"LOANID"||U',%'))

在我有限的测试中,这是这三个选项中最快的。但很可能有更好更快的方法。


不是很相关,但是您似乎将其作为 SYS 运行,这不是一个好主意,即使数据位于另一个模式中;您的loanId专栏似乎是nvarchar2(来自 SYS_OP_C2C 调用),这对于可能是数字但在任何情况下似乎只有 ASCII 字符的东西来说似乎很奇怪;NVL(:loanIds,'')什么都不做,因为空字符串和空字符串在 Oracle 中是一样的;并且nvl(:loanIds,'XX') = 'XX'可以避免:loanIds is not null魔法值。

于 2021-06-10T17:36:00.033 回答