sql - 如果我使用 5 个字符的搜索文本，则 Postgresql 不使用索引。使用 6 就可以了。为什么？

Question

我正在使用 Postgresql 13。

通过这个查询，PostgreSQL 正在使用索引：

SELECT *
FROM
    "players"
WHERE team_id = 3
    AND (
    code ILIKE 'lushij'
    OR
    REPLACE(lastname||firstname,' ','') ILIKE '%lushij%'
    OR REPLACE(firstname||lastname,' ','') ILIKE '%lushij%'
    OR personal_info->>'houses' ILIKE '%lushij%'
    )
LIMIT 15

Limit  (cost=333.01..385.77 rows=15 width=360)
  ->  Bitmap Heap Scan on players  (cost=333.01..4061.29 rows=1060 width=360)
        Recheck Cond: ((code ~~* 'lushij'::text) OR (replace((lastname || firstname), ' '::text, ''::text) ~~* '%lushij%'::text) OR (replace((firstname || lastname), ' '::text, ''::text) ~~* '%lushij%'::text) OR ((personal_info ->> 'houses'::text) ~~* '%lushij%'::text))
        Filter: (team_id = 3)
        ->  BitmapOr  (cost=333.01..333.01 rows=1060 width=0)
              ->  Bitmap Index Scan on players_code_trgm  (cost=0.00..116.75 rows=100 width=0)
                    Index Cond: (code ~~* 'lushij'::text)
              ->  Bitmap Index Scan on players_replace_last_first_name_trgm  (cost=0.00..66.40 rows=320 width=0)
                    Index Cond: (replace((lastname || firstname), ' '::text, ''::text) ~~* '%lushij%'::text)
              ->  Bitmap Index Scan on players_replace_first_last_name_trgm  (cost=0.00..66.40 rows=320 width=0)
                    Index Cond: (replace((firstname || lastname), ' '::text, ''::text) ~~* '%lushij%'::text)
              ->  Bitmap Index Scan on players_personal_info_houses_trgm_idx  (cost=0.00..82.40 rows=320 width=0)
                    Index Cond: ((personal_info ->> 'houses'::text) ~~* '%lushij%'::text)

使用相同的查询，但搜索文本少一个字符（从lushij到lushi），不使用索引：

SELECT *
FROM
    "players"
WHERE team_id = 3
    AND (
    code ILIKE 'lushi'
    OR
    REPLACE(lastname||firstname,' ','') ILIKE '%lushi%'
    OR REPLACE(firstname||lastname,' ','') ILIKE '%lushi%'
    OR personal_info->>'houses' ILIKE '%lushi%'
    )
LIMIT 15

Limit  (cost=0.00..235.65 rows=15 width=360)
  ->  Seq Scan on players  (cost=0.00..76853.53 rows=4892 width=360)
        Filter: ((team_id = 3) AND ((code ~~* 'lushi'::text) OR (replace((lastname || firstname), ' '::text, ''::text) ~~* '%lushi%'::text) OR (replace((firstname || lastname), ' '::text, ''::text) ~~* '%lushi%'::text) OR ((personal_info ->> 'houses'::text) ~~* '%lushi%'::text)))

为什么？

更新：

如果我评论LIMIT 15行，则使用索引。

这里的结构：

球员表结构

-- ----------------------------
-- Table structure for players
-- ----------------------------
DROP TABLE IF EXISTS "public"."players";
CREATE TABLE "public"."players" (
  "id" int8 NOT NULL DEFAULT nextval('players_id_seq'::regclass),
  "created_at" timestamptz(6) NOT NULL DEFAULT now(),
  "updated_at" timestamptz(6),
  "team_id" int8 NOT NULL,
  "firstname" text COLLATE "pg_catalog"."default",
  "lastname" text COLLATE "pg_catalog"."default",
  "code" text COLLATE "pg_catalog"."default",
  "personal_info" jsonb
)
;

-- ----------------------------
-- Indexes structure for table players
-- ----------------------------
CREATE INDEX "players_personal_info_houses_trgm_idx" ON "public"."players" USING gin (
  (personal_info ->> 'houses'::text) COLLATE "pg_catalog"."default" "public"."gin_trgm_ops"
);
CREATE INDEX "players_code_trgm" ON "public"."players" USING gin (
  "code" COLLATE "pg_catalog"."default" "public"."gin_trgm_ops"
);
CREATE INDEX "players_lower_code" ON "public"."players" USING btree (
  lower(code) COLLATE "pg_catalog"."default" "pg_catalog"."text_ops" ASC NULLS LAST
);
CREATE INDEX "players_replace_first_last_name_trgm" ON "public"."players" USING gin (
  replace(firstname || lastname, ' '::text, ''::text) COLLATE "pg_catalog"."default" "public"."gin_trgm_ops"
);
CREATE INDEX "players_replace_last_first_name_trgm" ON "public"."players" USING gin (
  replace(lastname || firstname, ' '::text, ''::text) COLLATE "pg_catalog"."default" "public"."gin_trgm_ops"
);

-- ----------------------------
-- Primary Key structure for table players
-- ----------------------------
ALTER TABLE "public"."players" ADD CONSTRAINT "players_pkey" PRIMARY KEY ("id");

-- ----------------------------
-- Foreign Keys structure for table players
-- ----------------------------
ALTER TABLE "public"."players" ADD CONSTRAINT "players_team_id_fkey" FOREIGN KEY ("team_id") REFERENCES "public"."teams" ("id") ON DELETE NO ACTION ON UPDATE NO ACTION;

score 1 · Accepted Answer

好的..这是基于我对 SQL Server 和 SQL 的一般知识，但它可能也适用于这里。

首先...因为您正在执行 a SELECT *，所以它需要在某个时候转到聚集索引。

使用非聚集索引（如果使用）是为了识别相关的行，然后它会一一挑选出这些行（嵌套循环连接，或有时称为索引查找/扫描 +键查找）。

如果行太多，这实际上是低效的——你最终会做更多的读取/等，而不仅仅是读取整个表。

减少 LIKE 过滤器的长度会增加基数估计，例如，增加过滤器在查询计划器/优化器中预期匹配的行数。

我猜 SQL 引擎会进行猜测（包括索引/数据的统计信息），并确定从聚集索引中读取所有数据可能更有效，而不是确定行并逐一读取它们。

OP 更新重新删除限制后更新。

嗯......再一次，这取决于它根据过滤器估计存在多少行。

想象一下，如果您在原始查询中执行 ILIKE '%e%' 。每隔一行可能与此匹配。由于您没有排序，它只需要读取（例如）聚集索引的前 30 行，它就会得到您的答案。再一次，查询规划器/优化器可能会得出这样的结论：这将是获得这些的最有效方式。

但是，如果没有限制，它将需要读取所有行才能获得所有结果。

对于 %e% 来说，只进行一次完整的聚集索引扫描可能更有效，因为它期望许多行匹配
对于更复杂/选择性的过滤，首先搜索索引（然后直接搜索聚集索引中的数据）通常更有效

score 1 · Accepted Answer

字符串越短，您的条件就越不具有选择性。根据其估计，PostgreSQL 认为对于短字符串，有足够的行匹配条件，即按顺序获取行并丢弃不匹配的行，直到找到 15 个匹配行更便宜。

许多OR条件很可能使优化器低估了选择性，因为这些条件被认为是不相关的，但情况可能并非如此。

sql - 如果我使用 5 个字符的搜索文本，则 Postgresql 不使用索引。使用 6 就可以了。为什么？

2 回答 2

Related

Reference