sql - 使用 SQL 逻辑“OR”和运算符“LIKE”时，如何调整 PostgreSQL 查询优化器？

Question

当查询包含使用运算符而不是使用运算符的 SQLOR条件时，如何调整 PostgreSQL 查询计划（或 SQL 查询本身）以更优化地使用可用索引？LIKE=

例如，考虑以下只需要5 毫秒即可执行的查询：

explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and scoped_id_value[2] = 'G130324135454100';

"Bitmap Heap Scan on request_attribute  (cost=44.94..6617.85 rows=2081 width=139) (actual time=4.619..4.619 rows=2 loops=1)"
"  Recheck Cond: (((scoped_id_value[2])::text = 'G130324135454100'::text) AND ((domain)::text = 'externalId'::text))"
"  ->  Bitmap Index Scan on request_attribute_accession_number  (cost=0.00..44.42 rows=2081 width=0) (actual time=3.777..3.777 rows=2 loops=1)"
"        Index Cond: ((scoped_id_value[2])::text = 'G130324135454100'::text)"
"Total runtime: 5.059 ms"

如查询计划所示，此查询利用了部分索引request_attribute_accession_number和索引条件scoped_id_value[2] = 'G130324135454100'。索引request_attribute_accession_number有以下定义：

CREATE INDEX request_attribute_accession_number
ON report_workflow.request_attribute((scoped_id_value[2]))
WHERE domain = 'externalId';

（请注意，表中的列scoped_id_value具有request_attribute类型character varying[]。）

但是，当我向同一个查询添加一个使用相同数组列元素的额外OR条件时，该查询尽管使用相同的第一个条件产生了相同的结果，但现在需要scoped_id_value[2]7553毫秒：LIKE=

explain analyze select *
from report_workflow.request_attribute
where domain = 'externalId'
and (scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%');

"Bitmap Heap Scan on request_attribute  (cost=7664.77..46768.27 rows=2122 width=139) (actual time=142.164..7552.650 rows=2 loops=1)"
"  Recheck Cond: ((domain)::text = 'externalId'::text)"
"  Rows Removed by Index Recheck: 1728712"
"  Filter: (((scoped_id_value[2])::text = 'G130324135454100'::text) OR ((scoped_id_value[2])::text ~~ '%G130324135454100%'::text))"
"  Rows Removed by Filter: 415884"
"  ->  Bitmap Index Scan on request_attribute_accession_number  (cost=0.00..7664.24 rows=416143 width=0) (actual time=136.249..136.249 rows=415886 loops=1)"
"Total runtime: 7553.154 ms"

请注意，这一次查询优化器scoped_id_value[2] = 'G130324135454100'在使用索引执行内部位图索引扫描时忽略了索引条件request_attribute_accession_number，因此生成了 415,886 行，而不是像第一个查询那样只生成两行。

当将OR带有LIKE运算符的条件引入第二个查询时，为什么优化器生成的查询计划不如第一个优化？如何调整查询优化器或查询以更像第一个查询？

score 2 · Accepted Answer

在第二个计划中，您有：

scoped_id_value[2] like '%G130324135454100%'

Postgres（或任何其他数据库）无法使用索引来解决这个问题。它会在索引中的什么位置出现？它甚至不知道从哪里开始，所以它必须进行全表扫描。

对于这种情况，您可以通过在表达式上构建索引来处理此问题（请参阅此处）。但是，这将非常特定于字符串'G130324135454100'.

我应该补充一点，问题不在于like. Postgres 将使用以下索引：

scoped_id_value[2] like 'G130324135454100%'

score 1 · Accepted Answer

不能将此表达式短路：

scoped_id_value[2] = 'G130324135454100'
or scoped_id_value[2] like '%G130324135454100%'

进入这个：

scoped_id_value[2] = 'G130324135454100'

因为它不会捕获在匹配的字符之前或之后有字符的情况：

scoped_id_value[2] like '%G130324135454100%'

唯一可能的短路是最后一个。并且只有当 Postgresql 意识到最后一个中的核心字符串（%s 之间）与前一个相同时。

sql - 使用 SQL 逻辑“OR”和运算符“LIKE”时，如何调整 PostgreSQL 查询优化器？

2 回答 2

Related

Reference