postgresql - Postgres 组合多个索引

Question

我有以下表格/索引 -

CREATE TABLE test
(
   coords geography(Point,4326), 
   user_id varchar(50), 
   created_at timestamp
);
CREATE INDEX ix_coords ON test USING GIST (coords);
CREATE INDEX ix_user_id ON test (user_id);
CREATE INDEX ix_created_at ON test (created_at DESC);

这是我要执行的查询：

select * 
from updates 
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000) 
and user_id='3212312' 
order by created_at desc
limit 60

当我运行查询时，它只使用ix_coords索引。如何确保 Postgres对查询也使用ix_user_id和索引？ix_created_at

这是一个新表，我在其中批量插入了生产数据。表中的总行数test：15,069,489

我正在使用 (effective_cache_size = 2GB) 运行 PostgreSQL 9.2.1（使用 Postgis）。这是我的本地 OSX，具有 16GB RAM、Core i7/2.5 GHz、非 SSD 磁盘。

添加EXPLAIN ANALYZE输出 -

Limit  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.652..1278.665 rows=60 loops=1)
   ->  Sort  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.651..1278.662 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 33kB
         ->  Index Scan using ix_coords on test  (cost=0.00..71.63 rows=1 width=280) (actual time=0.198..1278.227 rows=178 loops=1)
               Index Cond: (coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography)
               Filter: (((user_id)::text = '4f1092000b921a000100015c'::text) AND ('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 3122459
 Total runtime: 1278.701 ms

更新：

根据下面的建议，我尝试了关于 cords + user_id 的索引：

CREATE INDEX ix_coords_and_user_id ON updates USING GIST (coords, user_id);

..但得到以下错误：

ERROR:  data type character varying has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

更新：

所以CREATE EXTENSION btree_gist;解决了 btree/gist 复合索引问题。现在我的索引看起来像

CREATE INDEX ix_coords_user_id_created_at ON test USING GIST (coords, user_id, created_at);

注意：btree_gist 不接受 DESC/ASC。

新的查询计划：

Limit  (cost=134.99..135.00 rows=1 width=280) (actual time=273.282..273.292 rows=60 loops=1)
   ->  Sort  (cost=134.99..135.00 rows=1 width=280) (actual time=273.281..273.285 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 41kB
         ->  Index Scan using ix_updates_coords_user_id_created_at on updates  (cost=0.00..134.98 rows=1 width=280) (actual time=0.406..273.110 rows=115 loops=1)
               Index Cond: ((coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography) AND ((user_id)::text = '4e952bb5b9a77200010019ad'::text))
               Filter: (('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 1
 Total runtime: 273.331 ms

查询的性能比以前好，几乎快了一秒，但仍然不是很好。我想这是我能得到的最好的？？我希望在 60-80 毫秒左右。同样order by created_at desc从查询中提取，又减少了 100 毫秒，这意味着它无法使用索引。有任何解决这个问题的方法吗？

score 5 · Accepted Answer

我不知道 Pg 是否可以将 GiST 索引和常规 b-tree 索引与位图索引扫描结合起来，但我怀疑不能。您可能会在不向user_idGiST 索引添加列的情况下获得最好的结果（因此对于其他不使用的查询来说，它会变得更大和更慢user_id）。

作为一个实验，你可以：

CREATE EXTENSION btree_gist;
CREATE INDEX ix_coords_and_user_id ON test USING GIST (coords, user_id);

这可能会导致一个大索引，但可能会提升该查询 - 如果它有效。请注意，维护这样的索引将显着减慢INSERT和UPDATEs。如果您放弃旧ix_coords的查询，ix_coords_and_user_id即使它们不过滤，也会使用user_id，但它会比ix_coords. 保持两者都会使INSERT经济UPDATE放缓更加严重。

见btree-gist

（通过编辑问题彻底改变了问题；编写时，用户有一个多列索引，他们现在分成两个独立的索引）：

您似乎没有过滤或排序user_id，只是create_date。Pg 不会（不能？）只使用多列索引的第二项，例如(user_id, create_date)，它也需要使用第一项。

如果要索引create_date，请为其创建单独的索引。如果您使用并需要(user_id, create_date)索引并且通常不user_id单独使用，请查看是否可以反转列顺序。交替创建两个独立的索引 (user_id)和(create_date). 当需要两列时，Pg 可以使用位图索引扫描组合两个独立的索引。

score 2 · Accepted Answer

我认为克雷格的回答是正确的，但我只是想添加一些东西（它不适合评论）

您必须非常努力地强制PostgreSQL 使用索引。查询优化器很聪明，有时它会认为顺序表扫描会更快。通常是对的！:) 但是，您可以使用一些设置（例如 seq_page_cost、random_page_cost 等）来尝试让它有利于索引。如果您觉得它没有做出正确的决定，这里是一些您可能想要检查的配置的链接。但是，再一次……我的经验是，大多数时候，Postgres 比我聪明！:)

希望这对您（或将来的某人）有所帮助。

postgresql - Postgres 组合多个索引

2 回答 2

Related

Reference