1

我对 postgresql 有一个奇怪的问题。Somewhy planner 认为通过外键索引访问数据非常慢,使用顺序扫描。

有一个简化的表结构:

create table flight
(
    id        bigint not null constraint flight_pkey        primary key,
    parent_id bigint not null constraint flight_parent_fkey references flight
);

create table passenger
(
    id        bigint not null constraint passenger_pkey     primary key,
    flight_id bigint not null constraint pax_flight_fkey    references flight
);

create index pax_flight_id on passenger (flight_id);

表格passenger大约有 1400 万行,表格飞行大约有 6 万行。

问题是,我有一个带有许多可选条件的 QueryDSL 查询,其中一个是按航班过滤乘客。

QPassenger qPassenger = QPassenger.passenger;
Long flightId = 123456;
...

BooleanBuilder predicate = new BooleanBuilder();
predicate.and(qPassenger.flight.id.eq(flightId));

...

当我尝试获取符合这些可选条件的所有乘客时,它会生成如下所示的查询,并执行整整 30 秒。这太可怕了。它以某种方式对所有表使用顺序扫描和散列连接passenger

select
    passenger0_.id        as id1_13_,
    passenger0_.flight_id as flight_14_13_
from
    passenger passenger0_ cross join flights flight1_
where
    passenger0_.flight_id = 123456
 or flight1_.parent_id = 123456

但是,经过一天的寻求解决方案,我发现使用flight表的主键会使postgres使用主键索引:

select
    passenger0_.id        as id1_13_,
    passenger0_.flight_id as flight_14_13_
from
    passenger passenger0_ cross join flights flight1_
where
    flight1_.id = 123456            --  ←this line!
 or flight1_.parent_id = 123456

不幸的是,我无法过滤我手动收到的行,因为这将导致约 1300 万行,flightId而设置时每个航班约 300 名乘客。

➥ 所以,我的问题是:有没有办法告诉 QueryDSL/Hibernate 在这种情况下使用特定的列?即flight.id,不是passenger.flight_id

或者,另一个问题:我的 PostgreSQL 规划器出了什么问题,我该如何解决?


UPD规划师的计划:

  • 在 WHERE 条件下使用主键的好查询:
EXPLAIN ANALYZE
SELECT *
  FROM passenger p JOIN flights f ON p.flight_id = f.id
 WHERE (f.id = 123456
     OR f.parent_id = 123456);
QUERY PLAN
Nested Loop  (cost=3.66..1759.26 rows=310 width=951) (actual time=0.044..0.242 rows=184 loops=1)
  ->  Bitmap Heap Scan on flights f  (cost=3.10..9.78 rows=6 width=240) (actual time=0.024..0.029 rows=4 loops=1)
        Recheck Cond: ((id = 123456) OR (parent_id = 123456))
        Heap Blocks: exact=4
        ->  BitmapOr  (cost=3.10..3.10 rows=6 width=0) (actual time=0.018..0.018 rows=0 loops=1)
              ->  Bitmap Index Scan on flights_pkey  (cost=0.00..1.53 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
                    Index Cond: (id = 123456)
              ->  Bitmap Index Scan on flt_parent_id_index  (cost=0.00..1.56 rows=5 width=0) (actual time=0.009..0.009 rows=3 loops=1)
                    Index Cond: (parent_id = 123456)
  ->  Index Scan using passenger_flight_id on passenger p  (cost=0.56..286.73 rows=485 width=711) (actual time=0.005..0.023 rows=46 loops=4)
        Index Cond: (flight_id = f.id)
Planning Time: 0.566 ms
Execution Time: 0.321 ms
  • 在 WHERE 条件下使用外键的错误查询:
EXPLAIN ANALYZE
SELECT *
  FROM passenger p JOIN flights f ON p.flight_id = f.id
 WHERE (p.flight_id = 123456
     OR f.parent_id = 123456);
QUERY PLAN
Gather  (cost=34194.96..3461993.92 rows=734 width=951) (actual time=79878.815..80711.129 rows=184 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Hash Join  (cost=33194.96..3460920.52 rows=306 width=951) (actual time=71883.434..72044.345 rows=61 loops=3)
        Hash Cond: (p.flight_id = f.id)
        Join Filter: ((p.flight_id = 123456) OR (f.parent_id = 123456))
        Rows Removed by Join Filter: 11206038
        ->  Parallel Seq Scan on passenger p  (cost=0.00..827052.82 rows=14216282 width=711) (actual time=20.021..27298.757 rows=11206100 loops=3)
        ->  Parallel Hash  (cost=20891.65..20891.65 rows=275065 width=240) (actual time=1284.916..1284.917 rows=219796 loops=3)
              Buckets: 8192  Batches: 128  Memory Usage: 1248kB
              ->  Parallel Seq Scan on flights f  (cost=0.00..20891.65 rows=275065 width=240) (actual time=2.134..966.560 rows=219796 loops=3)
Planning Time: 0.605 ms
Execution Time: 80711.774 ms
4

0 回答 0