我有一个大的事实表和一个阶段表,需要加载 ~ 12K 作业的 pk。在创建新数据之前,我想删除即将插入的旧数据部分。
FACT
被分区但由不同的列然后连接谓词。
ETL
表约 12k 行,约 440KB 压缩大小。每个键ETL
都有大约 30-50 行,FACT
所以我预计大约有 400K 的删除。统计数据是最新的。
explain plan for
delete from fact f
where exists (
select 1
from etl e
where (e.id = f.id and e.somedate = f.somedate)
or (e.other_id = f.other_id)
)
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |Pstart| Pstop|
-----------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 43 | 14M (8)| 49:41:52 | | |
| 1 | DELETE | FACT | | | | | | |
|* 2 | FILTER | | | | | | | |
| 3 | PARTITION RANGE ALL| | 1219K| 50M| 7797 (3)| 00:01:34 | 1 | 15 |
| 4 | TABLE ACCESS FULL | FACT | 1219K| 50M| 7797 (3)| 00:01:34 | 1 | 15 |
|* 5 | TABLE ACCESS FULL | ETL | 2 | 38 | 13 (8)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------
基本场景,串行执行,非常缓慢且低效。
以为我可以通过并行执行加快速度:
explain plan for
delete /*+ PARALLEL(f,4) */ from fact f
where exists (
select 1
from etl e
where (e.id = f.id and e.somedate = f.somedate)
or (e.other_id = f.other_id)
)
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 43 | 14M (8)| 49:40:44 | | | | | |
| 1 | DELETE | FACT | | | | | | | | | |
|* 2 | FILTER | | | | | | | | | | |
| 3 | PX COORDINATOR | | | | | | | | | | |
| 4 | PX SEND QC (RANDOM)| :TQ10000 | 1219K| 50M| 2152 (2)| 00:00:26 | | | Q1,00 | P->S | QC (RAND) |
| 5 | PX BLOCK ITERATOR | | 1219K| 50M| 2152 (2)| 00:00:26 | 1 | 15 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| FACT | 1219K| 50M| 2152 (2)| 00:00:26 | 1 | 15 | Q1,00 | PCWP | |
|* 7 | TABLE ACCESS FULL | ETL | 2 | 38 | 13 (8)| 00:00:01 | | | | | |
--------------------------------------------------------------------------------------------------------------------------------
阅读该计划,我不确定它的平行之处。px 进程正在扫描表并将块发送到协调器进行过滤。这仍然不是很有效。如果有的话,我认为进程分配、调度和同步甚至可能使情况变得更糟。
所以,尽管如此,我ETL
是如此之小(~500K) - 为什么 oracle 不应该在 px 进程中随机拆分表并向每个表广播一份 ETL 副本。Oracle 正在使用广播对小型 NL 连接做同样的事情。这样我就可以获得完整的并行体验。所以我试图让oracle执行这个执行计划但到目前为止还没有成功
explain plan for
delete /*+ PARALLEL(f,4) PQ_DISTRIBUTE(@j e BROADCAST, NONE) */ from fact f
where exists (
select /*+ QB_NAME(j) */ 1
from etl e
where (e.id = f.id and e.somedate = f.somedate)
or (e.other_id = f.other_id)
)
我得到与上面相同的执行计划。