3

我有一个 PostgreSQL 查询花费的时间比我想要的要长。我正在查看的输出,EXPLAIN ANALYZE它提到了一个Bitmap Index Scan. 我已经在网上搜索并阅读了大约 10 分钟,但我无法弄清楚:

位图索引是一种制造出来的东西——如果我在某处的某个列中添加一个真实的索引,我可以改进它——还是它是一种特定类型的真实索引

这是我正在查询的单个表:

bugbot4b=> \d bug_snapshots
             Table "public.bug_snapshots"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 fixin_id   | integer                     | not null
 created_on | timestamp without time zone | not null
 pain       | integer                     | not null
 status_id  | integer                     | not null
Indexes:
    "bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
    "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
    "bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)

这是分析查询的结果。请注意,查询中有大约 3k 个不同fixin_id的字面值(在下面省略),并且该表有 900k 行。仅计算特定时间范围内的那些行会产生 15,000 行。

EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
   AND ("status_id" IN (2, 7, 5, 3))
   AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
   AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
   AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";

Sort  (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
 Sort Key: created_on
 Sort Method:  quicksort  Memory: 18kB
 ->  HashAggregate  (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
       ->  Bitmap Heap Scan on bug_snapshots  (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
             Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
             Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
             ->  Bitmap Index Scan on bug_snapshots_pkey  (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
                   Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)

ANALYZE 的结果是否告诉我需要向 fixin_id(和/或其他字段)添加索引以提高速度?或者这只是因为它的大小而“慢”?

4

1 回答 1

4

“位图索引扫描”

Postgres 本身没有“位图索引”。“位图索引扫描”是允许某些索引类型(包括默认 btree 索引)的索引访问方法。组合多个索引查找特别有用。手册:

索引访问方法可以支持“普通”索引扫描、“位图”索引扫描,或两者兼而有之。

您可以通过设置禁用位图扫描(仅用于调试目的!):

SET enable_bitmapscan = FALSE;

优化查询性能

对于长列表,连接到派生表通常比冗长的IN表达式更快。您可以使用VALUESunnest()用于该目的。甚至是一个临时表,可能带有索引。看:

SELECT created_on, sum(pain) AS sum_pain
FROM   unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN   bug_snapshots USING (fixin_id)
WHERE  status_id IN (2, 7, 5, 3)
AND    created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND    created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND    pain < 999
GROUP  BY created_on
ORDER  BY created_on;

部分多列索引可能会有所帮助(很多)。这取决于数据分布、负载、稳定查询条件等细节。最重要的是,WHERE表达式的选择性:部分索引通常仅在排除许多或大多数行时才有意义。就像是:

CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE  status_id IN (2, 7, 5, 3)
AND    pain < 999;

索引中列的顺序很重要。对于您的主键 btw 也是如此,它实现了另一个多列索引。看:

db<>fiddle here
sqlfiddle

小提琴中的性能测试几乎不可靠。运行您自己的测试!自 2013 年编写此答案以来,Postgres 也有许多改进!

timestamp [without time zone]

还有一件事:bug_snapshots.created_on是 type timestamp。根据您当前的时区设置解释时间戳。
但在查询中,您尝试与带时区 ( timestamptz) 的文字进行比较。这将适用于显式转换:

WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz

您的文字将被强制timestamptz转换并相应地翻译为您当地的时区。但是,由于您不提供数据类型,Postgres 将您的文字转换为匹配类型timestampnot timestamptz而忽略时区偏移量。很可能不是你的意图!

考虑这个测试:

SELECT min(created_on), max(created_on)
FROM   bug_snapshots
WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'
AND    created_on <= '2013-11-07 15:42:26.994994-0800'

看:

于 2013-11-08T22:28:12.247 回答