我有一个需要定期清理孤儿表的情况,所以我正在寻找一个高性能的解决方案。我尝试使用“IN”子句,但速度不是很快。列在两个表中都有所有必需的索引。(id - 主键,component_id - 索引,component_type - 索引)
DELETE FROM component_apportionment
WHERE id in (
SELECT a.id
FROM component_apportionment a
LEFT JOIN component_live c
ON (c.component_id = a.component_id
AND
c.component_type = a.component_type)
WHERE c.id is null);
基本上情况是从“component_apportionment”表中删除“ component_live ”表中不存在的记录。
上面查询的查询计划也很糟糕:
Delete on component_apportionment_copy1 (cost=3860927.55..3860929.09 rows=1 width=18) (actual time=183479.848..183479.848 rows=0 loops=1)
-> Nested Loop (cost=3860927.55..3860929.09 rows=1 width=18) (actual time=183479.811..183479.813 rows=1 loops=1)
-> HashAggregate (cost=3860927.12..3860927.13 rows=1 width=20) (actual time=183479.793..183479.793 rows=1 loops=1)
Group Key: a.id
-> Merge Right Join (cost=3753552.72..3860927.12 rows=1 width=20) (actual time=172941.125..183479.787 rows=1 loops=1)
Merge Cond: ((c.component_id = a.component_id) AND ((c.component_type)::text = (a.component_type)::text))
Filter: (c.id IS NULL)
Rows Removed by Filter: 5968195
-> Sort (cost=3390767.32..3413658.29 rows=9156391 width=21) (actual time=169852.438..172642.897 rows=8043013 loops=1)
Sort Key: c.component_id, c.component_type
Sort Method: external merge Disk: 310232kB
-> Seq Scan on component_live c (cost=0.00..2117393.91 rows=9156391 width=21) (actual time=0.004..155656.568 rows=9333382 loops=1)
-> Materialize (cost=362785.40..375049.75 rows=2452871 width=21) (actual time=3088.653..5343.013 rows=5968195 loops=1)
-> Sort (cost=362785.40..368917.58 rows=2452871 width=21) (actual time=3088.648..3989.163 rows=2452871 loops=1)
Sort Key: a.component_id, a.component_type
Sort Method: external merge Disk: 81504kB
-> Seq Scan on component_apportionment_copy1 a (cost=0.00..44969.71 rows=2452871 width=21) (actual time=0.920..882.040 rows=2452871 loops=1)
-> Index Scan using component_apportionment_copy1_pkey on component_apportionment_copy1 (cost=0.43..1.95 rows=1 width=14) (actual time=0.012..0.012 rows=1 loops=1)
Index Cond: (id = a.id)
Planning time: 5.573 ms
Execution time: 183554.675 ms
将不胜感激任何帮助。谢谢
笔记
在最坏的情况下,每个表大约有 8000 万条记录。两个表都有已用列的索引。
更新
“不存在”的查询计划
询问:
EXPLAIN (analyze, verbose, buffers) DELETE FROM component_apportionment_copy1
WHERE not exists (select 1
from component_live c
where c.component_id = component_apportionment_copy1.component_id);
Delete on vector.component_apportionment_copy1 (cost=2276557.80..2446287.39 rows=2104532 width=12) (actual time=203643.560..203643.560 rows=0 loops=1)
Buffers: shared hit=20875 read=2025400, temp read=46067 written=45813
-> Hash Anti Join (cost=2276557.80..2446287.39 rows=2104532 width=12) (actual time=202212.975..203643.486 rows=1 loops=1)
Output: component_apportionment_copy1.ctid, c.ctid
Hash Cond: (component_apportionment_copy1.component_id = c.component_id)
Buffers: shared hit=20874 read=2025400, temp read=46067 written=45813
-> Seq Scan on vector.component_apportionment_copy1 (cost=0.00..44969.71 rows=2452871 width=10) (actual time=0.003..659.668 rows=2452871 loops=1)
Output: component_apportionment_copy1.ctid, component_apportionment_copy1.component_id
Buffers: shared hit=20441
-> Hash (cost=2117393.91..2117393.91 rows=9156391 width=10) (actual time=198536.786..198536.786 rows=9333382 loops=1)
Output: c.ctid, c.component_id
Buckets: 16384 Batches: 128 Memory Usage: 3195kB
Buffers: shared hit=430 read=2025400, temp written=36115
-> Seq Scan on vector.component_live c (cost=0.00..2117393.91 rows=9156391 width=10) (actual time=0.039..194415.641 rows=9333382 loops=1)
Output: c.ctid, c.component_id
Buffers: shared hit=430 read=2025400
Planning time: 6.639 ms
Execution time: 203643.594 ms
它对两个表和更多数据进行 seq 扫描 - 它会越慢。