sql - 什么是“位图索引”？

Question

我有一个 PostgreSQL 查询花费的时间比我想要的要长。我正在查看的输出，EXPLAIN ANALYZE它提到了一个Bitmap Index Scan. 我已经在网上搜索并阅读了大约 10 分钟，但我无法弄清楚：

位图索引是一种制造出来的东西——如果我在某处的某个列中添加一个真实的索引，我可以改进它——还是它是一种特定类型的真实索引？

这是我正在查询的单个表：

bugbot4b=> \d bug_snapshots
             Table "public.bug_snapshots"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 fixin_id   | integer                     | not null
 created_on | timestamp without time zone | not null
 pain       | integer                     | not null
 status_id  | integer                     | not null
Indexes:
    "bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
    "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
    "bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)

这是分析查询的结果。请注意，查询中有大约 3k 个不同fixin_id的字面值（在下面省略），并且该表有 900k 行。仅计算特定时间范围内的那些行会产生 15,000 行。

EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
   AND ("status_id" IN (2, 7, 5, 3))
   AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
   AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
   AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";

Sort  (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
 Sort Key: created_on
 Sort Method:  quicksort  Memory: 18kB
 ->  HashAggregate  (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
       ->  Bitmap Heap Scan on bug_snapshots  (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
             Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
             Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
             ->  Bitmap Index Scan on bug_snapshots_pkey  (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
                   Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)

ANALYZE 的结果是否告诉我需要向 fixin_id（和/或其他字段）添加索引以提高速度？或者这只是因为它的大小而“慢”？

score 4 · Accepted Answer

“位图索引扫描”

Postgres 本身没有“位图索引”。“位图索引扫描”是允许某些索引类型（包括默认 btree 索引）的索引访问方法。组合多个索引查找特别有用。手册：

索引访问方法可以支持“普通”索引扫描、“位图”索引扫描，或两者兼而有之。

您可以通过设置禁用位图扫描（仅用于调试目的！）：

SET enable_bitmapscan = FALSE;

优化查询性能

对于长列表，连接到派生表通常比冗长的IN表达式更快。您可以使用VALUES 或unnest()用于该目的。甚至是一个临时表，可能带有索引。看：

通过整数数组中的索引查询表

SELECT created_on, sum(pain) AS sum_pain
FROM   unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN   bug_snapshots USING (fixin_id)
WHERE  status_id IN (2, 7, 5, 3)
AND    created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND    created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND    pain < 999
GROUP  BY created_on
ORDER  BY created_on;

部分多列索引可能会有所帮助（很多）。这取决于数据分布、负载、稳定查询条件等细节。最重要的是，WHERE表达式的选择性：部分索引通常仅在排除许多或大多数行时才有意义。就像是：

CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE  status_id IN (2, 7, 5, 3)
AND    pain < 999;

索引中列的顺序很重要。对于您的主键 btw 也是如此，它实现了另一个多列索引。看：

复合索引是否也适用于第一个字段的查询？

db<>fiddle here
_sqlfiddle

小提琴中的性能测试几乎不可靠。运行您自己的测试！自 2013 年编写此答案以来，Postgres 也有许多改进！

`timestamp [without time zone]`

还有一件事：bug_snapshots.created_on是 type timestamp。根据您当前的时区设置解释时间戳。
但在查询中，您尝试与带时区 ( timestamptz) 的文字进行比较。这将适用于显式转换：

WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz

您的文字将被强制timestamptz转换并相应地翻译为您当地的时区。但是，由于您不提供数据类型，Postgres 将您的文字转换为匹配类型timestamp（not timestamptz）而忽略时区偏移量。很可能不是你的意图！

考虑这个测试：

SELECT min(created_on), max(created_on)
FROM   bug_snapshots
WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'
AND    created_on <= '2013-11-07 15:42:26.994994-0800'

看：

在 Rails 和 PostgreSQL 中完全忽略时区

sql - 什么是“位图索引”？

1 回答 1

“位图索引扫描”

优化查询性能

timestamp [without time zone]

Related

Reference

`timestamp [without time zone]`