我对 postgresql 有一个奇怪的问题。Somewhy planner 认为通过外键索引访问数据非常慢,使用顺序扫描。
有一个简化的表结构:
create table flight
(
id bigint not null constraint flight_pkey primary key,
parent_id bigint not null constraint flight_parent_fkey references flight
);
create table passenger
(
id bigint not null constraint passenger_pkey primary key,
flight_id bigint not null constraint pax_flight_fkey references flight
);
create index pax_flight_id on passenger (flight_id);
表格passenger
大约有 1400 万行,表格飞行大约有 6 万行。
问题是,我有一个带有许多可选条件的 QueryDSL 查询,其中一个是按航班过滤乘客。
QPassenger qPassenger = QPassenger.passenger;
Long flightId = 123456;
...
BooleanBuilder predicate = new BooleanBuilder();
predicate.and(qPassenger.flight.id.eq(flightId));
...
当我尝试获取符合这些可选条件的所有乘客时,它会生成如下所示的查询,并执行整整 30 秒。这太可怕了。它以某种方式对所有表使用顺序扫描和散列连接passenger
。
select
passenger0_.id as id1_13_,
passenger0_.flight_id as flight_14_13_
from
passenger passenger0_ cross join flights flight1_
where
passenger0_.flight_id = 123456
or flight1_.parent_id = 123456
但是,经过一天的寻求解决方案,我发现使用flight
表的主键会使postgres使用主键索引:
select
passenger0_.id as id1_13_,
passenger0_.flight_id as flight_14_13_
from
passenger passenger0_ cross join flights flight1_
where
flight1_.id = 123456 -- ←this line!
or flight1_.parent_id = 123456
不幸的是,我无法过滤我手动收到的行,因为这将导致约 1300 万行,flightId
而设置时每个航班约 300 名乘客。
➥ 所以,我的问题是:有没有办法告诉 QueryDSL/Hibernate 在这种情况下使用特定的列?即flight.id
,不是passenger.flight_id
。
或者,另一个问题:我的 PostgreSQL 规划器出了什么问题,我该如何解决?
UPD规划师的计划:
- 在 WHERE 条件下使用主键的好查询:
EXPLAIN ANALYZE
SELECT *
FROM passenger p JOIN flights f ON p.flight_id = f.id
WHERE (f.id = 123456
OR f.parent_id = 123456);
QUERY PLAN
Nested Loop (cost=3.66..1759.26 rows=310 width=951) (actual time=0.044..0.242 rows=184 loops=1)
-> Bitmap Heap Scan on flights f (cost=3.10..9.78 rows=6 width=240) (actual time=0.024..0.029 rows=4 loops=1)
Recheck Cond: ((id = 123456) OR (parent_id = 123456))
Heap Blocks: exact=4
-> BitmapOr (cost=3.10..3.10 rows=6 width=0) (actual time=0.018..0.018 rows=0 loops=1)
-> Bitmap Index Scan on flights_pkey (cost=0.00..1.53 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (id = 123456)
-> Bitmap Index Scan on flt_parent_id_index (cost=0.00..1.56 rows=5 width=0) (actual time=0.009..0.009 rows=3 loops=1)
Index Cond: (parent_id = 123456)
-> Index Scan using passenger_flight_id on passenger p (cost=0.56..286.73 rows=485 width=711) (actual time=0.005..0.023 rows=46 loops=4)
Index Cond: (flight_id = f.id)
Planning Time: 0.566 ms
Execution Time: 0.321 ms
- 在 WHERE 条件下使用外键的错误查询:
EXPLAIN ANALYZE
SELECT *
FROM passenger p JOIN flights f ON p.flight_id = f.id
WHERE (p.flight_id = 123456
OR f.parent_id = 123456);
QUERY PLAN
Gather (cost=34194.96..3461993.92 rows=734 width=951) (actual time=79878.815..80711.129 rows=184 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=33194.96..3460920.52 rows=306 width=951) (actual time=71883.434..72044.345 rows=61 loops=3)
Hash Cond: (p.flight_id = f.id)
Join Filter: ((p.flight_id = 123456) OR (f.parent_id = 123456))
Rows Removed by Join Filter: 11206038
-> Parallel Seq Scan on passenger p (cost=0.00..827052.82 rows=14216282 width=711) (actual time=20.021..27298.757 rows=11206100 loops=3)
-> Parallel Hash (cost=20891.65..20891.65 rows=275065 width=240) (actual time=1284.916..1284.917 rows=219796 loops=3)
Buckets: 8192 Batches: 128 Memory Usage: 1248kB
-> Parallel Seq Scan on flights f (cost=0.00..20891.65 rows=275065 width=240) (actual time=2.134..966.560 rows=219796 loops=3)
Planning Time: 0.605 ms
Execution Time: 80711.774 ms