postgresql - Postgres 相似度函数不恰当地使用三元组索引

Question

我有一个简单的person表，其中有一last_name列我添加了 GIST 索引

CREATE INDEX last_name_idx ON person USING gist (last_name gist_trgm_ops);

根据https://www.postgresql.org/docs/10/pgtrgm.html上的文档，<->操作员应该使用这个索引。但是，当我实际尝试使用此查询使用此差异运算符时：

explain verbose select * from person where last_name <-> 'foobar' > 0.5;

我得到了这个：

Seq Scan on public.person  (cost=0.00..290.82 rows=4485 width=233)
  Output: person_id, first_name, last_name
  Filter: ((person.last_name <-> 'foobar'::text) < '0.5'::double precision)

而且看起来并没有使用索引。但是，如果我将%运算符与此命令一起使用：

explain verbose select * from person where last_name % 'foobar';

它似乎使用索引：

Bitmap Heap Scan on public.person  (cost=4.25..41.51 rows=13 width=233)
  Output: person_id, first_name, last_name
  Recheck Cond: (person.last_name % 'foobar'::text)
  ->  Bitmap Index Scan on last_name_idx  (cost=0.00..4.25 rows=13 width=0)
        Index Cond: (person.last_name % 'foobar'::text)

我还注意到，如果我将运算符移动到查询的选择部分，索引会再次被忽略：

explain verbose select last_name % 'foobar' from person;

Seq Scan on public.person  (cost=0.00..257.19 rows=13455 width=1)
  Output: (last_name % 'foobar'::text)

我是否遗漏了一些关于相似度函数如何使用三元索引的明显内容？

我在 OSX 上使用 Postgres 10.5。

编辑 1

根据 Laurenz 的建议，我尝试设置enable_seqscan = off，但不幸的是，使用<->操作员的查询似乎仍然忽略了索引。

show enable_seqscan;
 enable_seqscan
----------------
 off

explain verbose select * from person where last_name <-> 'foobar' < 0.5;

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.person  (cost=10000000000.00..10000000290.83 rows=4485 width=233)
   Output: person_id, first_name, last_name
   Filter: ((person.last_name <-> 'foobar'::text) < '0.5'::double precision)

score 1 · Accepted Answer

这种行为对于所有类型的索引都是正常的。

第一个查询不是可以使用索引的形式。为此，条件必须是形式

<indexed expression> <operator supported by the index> <quasi-constant>

其中最后一个表达式在索引扫描期间保持不变，并且运算符返回一个布尔值。您的表达 ´last_name <-> 'foobar' > 0.5` 不是那种形式。

<->必须在子句中使用运算符才能ORDER BY使用索引。

第三个查询不使用索引，因为该查询会影响表的所有行。索引不会加快表达式的计算速度，它仅用于快速识别表的子集（或以某种排序顺序获取行）。

postgresql - Postgres 相似度函数不恰当地使用三元组索引

1 回答 1

Related

Reference