假设我们在下面的人为示例中定义了三个表(A、B 和 C),其中 A 和 B 与 C 相关(有外键)。假设我们想要来自所有三个表的值和 A 和 B 的谓词. Oracle 在任何时候只能将两个行集连接在一起。我们得到一个类似于 ((A -> C) -> B) 的连接顺序。这意味着我们花费 I/O 从 C 获取行,当我们重新加入 B(和 B 的谓词)时,我们最终只是丢弃了这些行。
我们如何避免表 C 上的这种“浪费”I/O?
星形转换很棒,但只有在优化器确定成本证明星形转换合理时才会生效。也就是说,我们不能保证得到星形变换。这可能看起来像一个人想要的,但优化器正在获得较差的估计行(参见下面的示例 - 相差 10 倍)。因此,优化选择不使用星形变换,否则它会被证明是有益的。
由于 SQL 是由 BI 报告工具生成的,因此我们不能像 from 那样手动编写星型转换中的查询。
也许我的问题是如何“强制”优化器使用星形转换而不以该形式手动编写查询?或者,也许,我的问题是如何让估计的行更好,这样我们就可以更加确信优化器会调用星形转换?或者,也许(很可能)还有其他一些我还不知道的很酷的 Oracle 特性可能会提供解决方案。
Oracle 12.1 企业版(但几个月后升级到 19.1)提前致谢。
drop table join_back_c;
drop table join_back_a;
drop table join_back_b;
create table join_back_a
as
with "D" as (select 1 from dual connect by level <= 1000)
select rownum a_code
, rpad('x',100) a_name
from "D"
;
create unique index IX_join_back_a_code on join_back_a(a_code);
alter table join_back_a add constraint PK_dan_join_back_a primary key (a_code);
create table join_back_b
as
with "D" as (select /*+ materialize */ 1 from dual connect by level <= 320)
select rownum b_id
, mod(rownum, 10) b_group
from "D", "D"
where rownum <= 100000 --100k
;
create unique index IX_join_back_b_id on join_back_b(b_id);
create index IX_join_back_b_group on join_back_b(b_group);
alter table join_back_b add constraint PK_dan_join_back_b primary key (b_id);
create table join_back_c
as
with "D" as (select /*+ materialize */ level from dual connect by level <= 3200)
select rownum c_id
, trunc(dbms_random.value(1, 1000)) a_code --table a FK
, trunc(dbms_random.value(1, 100000)) b_id --table b FK
from "D", "D"
where rownum <= 1000000 -- 1M
;
create index IR_join_back_c_a_code on join_back_c(a_code);
create index IR_join_back_c_b_id on join_back_c(b_id);
exec dbms_stats.gather_table_stats('DATA','JOIN_BACK_C');
exec dbms_stats.gather_table_stats('DATA','JOIN_BACK_A');
exec dbms_stats.gather_table_stats('DATA','JOIN_BACK_B');
select *
from join_back_a "A"
join join_back_c "C"
on A.a_code = C.a_code
join join_back_b "B"
on B.b_id = C.b_id
where a.a_code = 1
and b.b_group = 1
;
--------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
--------------------------------------------------------------------------------------------------------
| 0 | select statement | | 1001 | 124K| 983 (2)| 00:00:01 |
|* 1 | hash join | | 1001 | 124K| 983 (2)| 00:00:01 |
| 2 | nested LOOPS | | | | | |
| 3 | nested LOOPS | | 1001 | 116K| 839 (1)| 00:00:01 |
| 4 | table access by index ROWID| JOIN_BACK_A | 1 | 105 | 2 (0)| 00:00:01 |
|* 5 | index range scan | IX_JOIN_BACK_A_CODE | 1 | | 1 (0)| 00:00:01 |
|* 6 | index range scan | IR_JOIN_BACK_C_A_CODE | 1001 | | 4 (0)| 00:00:01 |
| 7 | table access by index ROWID | JOIN_BACK_C | 1001 | 14014 | 837 (1)| 00:00:01 |
|* 8 | table access full | JOIN_BACK_B | 10000 | 80000 | 143 (5)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."B_ID"="C"."B_ID")
5 - access("A"."A_CODE"=1)
6 - access("C"."A_CODE"=1)
8 - filter("B"."B_GROUP"=1)
select count(*)
from join_back_a "A"
join join_back_c "C"
on A.a_code = C.a_code
join join_back_b "B"
on B.b_id = C.b_id
where a.a_code = 1
and b.b_group = 1
; -- about 100 rows
加入顺序:((A -> C) -> B)
A -> C(第 3 步)的准确估计行数约为 1k。
第 8 步的估计也很准确。
但是,这个与 B 的连接(步骤 1)只会进一步减少步骤 3 中的 1k 行集。在这种情况下,B 的谓词将 (A -> C) 行集减少了 1/10。
这意味着我们从 C 中访问了 1000 行,只是为了丢弃其中的 900 行。
select /*+ star_transformation */
*
from join_back_a "A"
join join_back_c "C"
on A.a_code = C.a_code
join join_back_b "B"
on B.b_id = C.b_id
where a.a_code = 1
and b.b_group = 1
;
--------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
--------------------------------------------------------------------------------------------------------
| 0 | select statement | | 1001 | 124K| 983 (2)| 00:00:01 |
|* 1 | hash join | | 1001 | 124K| 983 (2)| 00:00:01 |
| 2 | nested LOOPS | | | | | |
| 3 | nested LOOPS | | 1001 | 116K| 839 (1)| 00:00:01 |
| 4 | table access by index ROWID| JOIN_BACK_A | 1 | 105 | 2 (0)| 00:00:01 |
|* 5 | index range scan | IX_JOIN_BACK_A_CODE | 1 | | 1 (0)| 00:00:01 |
|* 6 | index range scan | IR_JOIN_BACK_C_A_CODE | 1001 | | 4 (0)| 00:00:01 |
| 7 | table access by index ROWID | JOIN_BACK_C | 1001 | 14014 | 837 (1)| 00:00:01 |
|* 8 | table access full | JOIN_BACK_B | 10000 | 80000 | 143 (5)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."B_ID"="C"."B_ID")
5 - access("A"."A_CODE"=1)
6 - access("C"."A_CODE"=1)
8 - filter("B"."B_GROUP"=1)
我正在寻找类似于以下内容的执行路径。尽管下面估计有 10M 行,但该查询的行数仍然保持在 100 左右。但是,我们无法将生成的 SQL 控制到这种程度。这就是上面所说的在星形转换中手动编写查询,例如 from。
select *
from join_back_a "A"
join join_back_c "C"
on A.a_code = C.a_code
join join_back_b "B"
on B.b_id = C.b_id
where C.rowid in ( select C1.rowid
from join_back_C "C1"
join join_back_a "A1"
on C1.a_code = A1.a_code
where A1.a_code = 1
intersect
select C2.rowid
from join_back_C "C2"
join join_back_b "B1"
on C2.b_id = B1.b_id
where B1.b_group = 1
)
;
---------------------------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes |TempSpc| cost (%CPU)| time |
---------------------------------------------------------------------------------------------------------------
| 0 | select statement | | 9928K| 1316M| | 4649 (17)| 00:00:01 |
|* 1 | hash join | | 9928K| 1316M| | 4649 (17)| 00:00:01 |
| 2 | table access full | JOIN_BACK_A | 1000 | 102K| | 16 (0)| 00:00:01 |
|* 3 | hash join | | 9928K| 321M| | 4320 (11)| 00:00:01 |
| 4 | table access full | JOIN_BACK_B | 100K| 781K| | 142 (5)| 00:00:01 |
| 5 | nested LOOPS | | 10M| 248M| | 3858 (3)| 00:00:01 |
| 6 | view | VW_NSO_1 | 1001 | 12012 | | 2855 (4)| 00:00:01 |
| 7 | INTERSECTION | | | | | | |
| 8 | SORT UNIQUE | | 1001 | 18018 | | | |
| 9 | NESTED LOOPS | | 1001 | 18018 | | 5 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | IX_JOIN_BACK_A_CODE | 1 | 4 | | 1 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | IR_JOIN_BACK_C_A_CODE | 1001 | 14014 | | 4 (0)| 00:00:01 |
| 12 | SORT UNIQUE | | 99191 | 2131K| 3120K| | |
|* 13 | HASH JOIN | | 99191 | 2131K| | 1789 (5)| 00:00:01 |
|* 14 | TABLE ACCESS FULL | JOIN_BACK_B | 10000 | 80000 | | 143 (5)| 00:00:01 |
| 15 | INDEX FAST FULL SCAN | IR_JOIN_BACK_C_B_ID | 1000K| 13M| | 1614 (3)| 00:00:01 |
| 16 | TABLE ACCESS BY USER ROWID| JOIN_BACK_C | 10000 | 136K| | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."A_CODE"="C"."A_CODE")
3 - access("B"."B_ID"="C"."B_ID")
10 - access("A1"."A_CODE"=1)
11 - access("C1"."A_CODE"=1)
13 - access("C2"."B_ID"="B1"."B_ID")
14 - filter("B1"."B_GROUP"=1)
尝试将表 C 上的两个外键索引变成位图索引 - 不走运。此外,尝试了表 C(a_code, b_id) 上的复合索引 - 再次,没有运气。此外,复合索引也不是可取的,因为我们的表 C 确实有很多外键(一些代理项和一些自然键)。