performance - 为什么 PostgreSQL 为简单的查询做出如此艰难的计划？

Question

我有一个带有索引的 2500 万行“Zemla”表

CREATE INDEX zemla_level
  ON public."Zemla"
  USING btree
  (level);

现在我做简单的查询

select * from "Zemla" where level = 7

并获得非常硬的查询计划

Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=181) (actual time=216.681..758.663 rows=975247 loops=1)
  Recheck Cond: (level = 7)
  Heap Blocks: exact=54465
  ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=198.041..198.041 rows=1949202 loops=1)
        Index Cond: (level = 7)

和另一个简单的查询，当我认为索引存在时应该立即执行

select count(*) from "Zemla" where level = 7

Aggregate  (cost=639149.25..639149.26 rows=1 width=0) (actual time=1188.366..1188.366 rows=1 loops=1)
  ->  Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=0) (actual time=213.918..763.833 rows=975247 loops=1)
        Recheck Cond: (level = 7)
        Heap Blocks: exact=54465
        ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=195.409..195.409 rows=1949202 loops=1)
              Index Cond: (level = 7)

我的问题是，为什么 PostgreSQL 在第一次索引扫描之后会进行另一次位图堆扫描，开销如此之大？

编辑：什么是查询计划中的“位图堆扫描”？是另一个问题，因为它回答了为什么使用 OR 运算符的某些查询具有位图堆扫描。我的查询既没有 OR 也没有 AND 运算符

score 1 · Accepted Answer

如果我没记错的话，位图堆扫描是从磁盘获取数据的算法。它分析引擎必须获取的所有磁盘页面并对其进行排序，以最大限度地减少硬盘驱动器磁头的移动。

这需要时间，因为您的表必须非常大，并且可能在磁盘上高度分散。

对于您的第二个查询count(*)，PostgreSQL 仍然需要读取结果行以验证它们是否存在；在这种情况下，其他数据库系统可能只需要引用索引。查看此页面以获取更多信息：

https://wiki.postgresql.org/wiki/Index-only_scans

在桌子上试一试VACCUM FULL，看看它是否加快了速度。

performance - 为什么 PostgreSQL 为简单的查询做出如此艰难的计划？

1 回答 1

Related

Reference