9

我能够让连接消除适用于简单的情况,例如一对一关系,但不适用于稍微复杂的场景。最终我想尝试锚建模,但首先我需要找到解决这个问题的方法。我正在使用 Oracle 12c Enterprise Edition Release 12.1.0.2.0。

我的测试用例的 DDL:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

一些示例数据:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

5NF 观点

第一个视图无法编译 - 它失败并出现 ORA-01799: a column may not be outer-joined to a subquery。不幸的是,当我查看锚建模的在线示例时,这就是大多数历史视图的定义方式......

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

下面是我修复它的尝试。当通过简单的选择来使用这个视图时product_id,Oracle 设法消除了 product_color 而不是product_price。

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

我发现的唯一解决方案是使用标量子查询,如下所示:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

现在 Oracle 成功地消除了 product_price 表。但是,标量子查询的实现方式与连接不同,它们的执行方式根本不允许我在现实世界的场景中获得任何可接受的性能。

TL;DR 我如何重写视图product_5nf以使 Oracle 成功地消除这两个依赖表?

4

5 回答 5

4

我认为你在这里有两个问题。

首先,加入消除仅适用于某些特定情况(PK-PK、PK-FK 等)。这不是一般的事情,您可以LEFT JOIN对任何行集返回每个连接键值的单行并让 Oracle 消除连接。

其次,即使 Oracle 已经足够先进,可以对任何LEFT JOIN它知道每个连接键值只能获得一行的任何连接消除进行连接消除,Oracle 仍然不支持LEFT JOINS基于复合键的连接消除(Oracle 支持文档 887553.1 说明了这一点将在 R12.2 中推出)。

您可以考虑的一种解决方法是使用每个product_id. 然后LEFT JOIN到物化视图。像这样:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------
于 2016-11-08T19:43:59.763 回答
2

我不能让price join被淘汰,但是如果你这样做,它至少可以减少访问单个索引进行价格检查:

CREATE OR REPLACE view product_5nf as
select p.product_id
      ,pc.color
      ,pp.price 
 from product p
 left join product_color pc ON p.product_id = pc.product_id
 left join (select pp1.product_id, pp1.price 
              from (SELECT product_id,
                           price,
                           from_date,
                           max(from_date) OVER (PARTITION BY product_id) max_from_date
                    FROM   product_price) pp1
             where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;
于 2016-11-08T15:38:41.393 回答
1

现在 Oracle 成功地消除了 product_price 表。但是,标量子查询的实现方式与连接不同,它们的执行方式根本不允许我在现实世界的场景中获得任何可接受的性能。

Oracle 12.1 中基于成本的优化器可以对未嵌套的标量子查询执行查询转换。LEFT JOIN因此,性能可能与您在问题中所追求的一样好。

诀窍是你必须稍微调整一下。

首先,确保标量子查询返回max()no group by,这样 CBO 就知道不可能获得超过一行。(否则它不会取消嵌套)。

其次,您需要将所有字段组合product_price到一个标量子查询中,否则 CBO 将取消嵌套并product_price多次加入。

这是一个 Oracle 12.1 的测试用例,它说明了这个工作。

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;


create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;

insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;

--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));

commit;

begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end; 

commit;

alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );

create or replace view product_5nf as
   select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price 
from 
(    select p.product_id
         ,pc.color
         ,( SELECT max(composite_column)  FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
     from product p
     left join product_color pc on pc.product_id = p.product_id )  d
;

select product_id from product_5nf;

----------------------------------------------
| Id  | Operation         | Name    | E-Rows |
----------------------------------------------
|   0 | SELECT STATEMENT  |         |        |
|   1 |  TABLE ACCESS FULL| PRODUCT |    100K|
----------------------------------------------

select * from product_5nf;

SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

--------------------------------------------------------------------------------------
| Id  | Operation                | Name          | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |        |       |       |          |
|*  1 |  HASH JOIN RIGHT OUTER   |               |    100K|  8387K|  3159K| 8835K (0)|
|   2 |   VIEW                   | VW_SSQ_2      |      2 |       |       |          |
|   3 |    HASH GROUP BY         |               |      2 |    13M|  2332K|   12M (0)|
|   4 |     VIEW                 | VM_NWVW_3     |      2 |       |       |          |
|*  5 |      FILTER              |               |        |       |       |          |
|   6 |       HASH GROUP BY      |               |      2 |    23M|  5055K|   20M (0)|
|*  7 |        HASH JOIN         |               |    480K|    12M|  4262K|   17M (0)|
|   8 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|   9 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|* 10 |   HASH JOIN OUTER        |               |    100K|  5918K|  3056K| 5847K (0)|
|  11 |    TABLE ACCESS FULL     | PRODUCT       |    100K|       |       |          |
|  12 |    TABLE ACCESS FULL     | PRODUCT_COLOR |    100K|       |       |          |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_2"="P"."PRODUCT_ID")
   5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
   7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
  10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")
于 2016-11-09T13:40:04.303 回答
0

好的,我正在回答我自己的问题。此答案中的信息对Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production有效,但可能不适用于更高版本。不要投票给这个答案,因为它不能回答这个问题。

由于当前版本的特定限制(如 Mathew McPeak 所述),根本不可能让 Oracle 完全消除底层 5NF 视图中的不必要连接。限制是在基于复合键的左连接上不可能进行连接消除

解决此限制的任何尝试似乎都会引入重复或更新异常。接受的答案演示了如何通过使用物化视图并因此复制数据来克服优化器中的这一限制。这个答案显示了如何用更少的重复但更新异常来解决问题。

此解决方法基于您可以在唯一索引中使用可为空的列这一事实。我们将放置null所有历史版本和product_id最新版本的实际版本,并使用外键引用产品表。

alter table product_price add(
   latest_id number
  ,constraint product_price_uk  unique(latest_id)
  ,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
  ,constraint product_price_chk check(latest_id = product_id)
);

-- One-time update of existing data
update product_price a
   set a.latest_id = a.product_id
 where from_date = (select max(from_date) 
                      from product_price b 
                     where a.product_id = b.product_id);   

PRODUCT_ID FROM_DATE       PRICE  LATEST_ID
---------- ---------- ---------- ----------
         1 2016-01-01         10       null
         1 2016-02-01          8       null
         1 2016-05-01          5          1
         2 2016-02-01          5          2
         4 2016-01-01         10          4

-- New view definition             
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price
     from product p
     left join product_color pc on(pc.product_id = p.product_id)
     left join product_price pp on(pp.latest_id  = p.product_id);

当然,现在latest_id必须手动维护......每当插入新记录时,必须首先将旧记录更新为null。

这种方法有两个好处。首先,Oracle 能够完全删除不必要的连接。其次,连接不作为标量子查询执行。

SQL> select count(*) from product_5nf;

---------------------------------------
| Id  | Operation        | Name       |
---------------------------------------
|   0 | SELECT STATEMENT |            |
|   1 |  SORT AGGREGATE  |            |
|   2 |   INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------

Oracle 认识到即使不触及基表也可以解决计数问题。并且看不到不必要的连接...

SQL> select product_id, price from product_5nf;

---------------------------------------------------------
| Id  | Operation                    | Name             |
---------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |
|*  1 |  HASH JOIN OUTER             |                  |
|   2 |   INDEX FULL SCAN            | PRODUCT_PK       |
|   3 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE    |
|*  4 |    INDEX FULL SCAN           | PRODUCT_PRICE_UK |
---------------------------------------------------------

Oracle 认识到我们必须加入product_price才能获得价格列。而且product_color无处可寻……

SQL> select * from product_5nf;

----------------------------------------------------------
| Id  | Operation                     | Name             |
----------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |
|*  1 |  HASH JOIN OUTER              |                  |
|   2 |   NESTED LOOPS OUTER          |                  |
|   3 |    INDEX FULL SCAN            | PRODUCT_PK       |
|   4 |    TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR    |
|*  5 |     INDEX UNIQUE SCAN         | PRODUCT_COLOR_PK |
|   6 |   TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE    |
|*  7 |    INDEX FULL SCAN            | PRODUCT_PRICE_UK |
----------------------------------------------------------

这里 Oracle 必须实现所有连接,因为所有列都被引用。

于 2016-11-15T10:49:04.510 回答
0

[我不知道 ANTI-JOIN 是否算作 Oracle 中的子查询],但not exists诀窍通常是避免聚合子查询的一种方法:

CREATE VIEW product_5nfa as
   SELECT p.product_id
         ,pc.color
         ,pp.price
     FROM product p
     LEFT JOIN product_color pc
        ON pc.product_id = p.product_id
     LEFT join product_price pp
        ON pp.product_id = p.product_id
        AND NOT EXISTS ( SELECT * FROM product_price pp2
            WHERE pp2.product_id = pp.product_id
            AND pp2.from_date  > pp.from_date
            )   
     ;

来自 OP 的评论:视图已创建,但 Oracle 仍无法删除连接。这是执行计划。

select count(*) from product_5nfa;

-------------------------------------------------
| Id  | Operation            | Name             |
-------------------------------------------------
|   0 | SELECT STATEMENT     |                  |
|   1 |  SORT AGGREGATE      |                  |
|   2 |   NESTED LOOPS OUTER |                  |
|   3 |    INDEX FULL SCAN   | PRODUCT_PK       |
|   4 |    VIEW              |                  |
|   5 |     NESTED LOOPS ANTI|                  |
|*  6 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|*  7 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------
于 2016-11-15T11:08:36.100 回答