我目前正在研究一种搜索功能,该功能最终会通过 LIKE 查询访问数据库。它曾经是这种形式
WHERE some_id = blah AND some_timestamp > blah AND (field1 LIKE '%some_text%' OR field2 LIKE '%some_text%' OR ...) ORDER BY some_timestamp DESC
。由于该表的大小为数千万行,特别是当它在非常旧的时间戳上过滤时,这并没有很好地扩展。
经过一些研究,看起来三元组索引可能对文本搜索更有效。所以我在所有连接的文本字段上添加了一个三元索引,最初得到了很好的结果。尽管我发现了回归,但在更改了新查询之后。不再命中旧索引(some_id 和 some_timestamp DESC 上的 btree)。因此,新的文本搜索有助于过去非常慢的某些文本查询,以及由于 btree 索引而过去非常快(几毫秒)的其他文本查询现在超级慢(见下文)。
有没有办法两全其美?快速三元组文本搜索和快速 btree 索引,用于需要它的查询?
笔记:
Postgres 11.6
我也尝试使用 btree_gin 索引来索引时间戳列,但性能几乎相同。
我稍微修改了我的查询(连接空白)以绕过三元索引并验证慢查询返回到 btree 索引和 <10ms 执行时间。
我尝试了一些查询重新排列,试图让两个索引都无济于事。
桌子:
table1
---------------------------------
some_id | bigint
field1 | text
field2 | text
field3 | text
field4 | text
field5 | text
field6 | bigint
some_timestamp | timestamp without time zone
三元索引:
CREATE INDEX CONCURRENTLY IF NOT EXISTS trgm_idx ON table1 USING gin ((COALESCE(field1, '') || ' ' || COALESCE(field2, '') || COALESCE(field3, '') || ' ' || COALESCE(field4, '') || ' ' || COALESCE(field5, '') || ' ' || field6::text) gin_trgm_ops);
询问:
SELECT *
FROM table1 i
WHERE i.some_id = 1
AND (COALESCE(field1, '') || ' ' || COALESCE(field2, '') || COALESCE(field3, '') || ' ' || COALESCE(field4, '') || ' ' || COALESCE(field5, '') || ' ' || field6::text) ILIKE '%some_text%'
AND i.some_timestamp > '2015-01-00 00:00:00.0'
ORDER BY some_timestamp DESC limit 20;
解释:
Limit (cost=1043.06..1043.11 rows=20 width=446) (actual time=37240.094..37240.099 rows=20 loops=1)
-> Sort (cost=1043.06..1043.15 rows=39 width=446) (actual time=37240.092..37240.095 rows=20 loops=1)
Sort Key: some_timestamp
Sort Method: top-N heapsort Memory: 36kB
-> Bitmap Heap Scan on table1 i (cost=345.01..1042.03 rows=39 width=446) (actual time=1413.415..37202.331 rows=83066 loops=1)
Recheck Cond: ((((((((((COALESCE(field1, ''::text) || ' '::text) || COALESCE(field2, ''::text)) || COALESCE(field3, ''::text)) || ' '::text) || COALESCE(field4, ''::text)) || ' '::text) || COALESCE(field5, ''::text)) || ' '::text) || (field6)::text) ~~* '%some_text%'::text)
Rows Removed by Index Recheck: 23
Filter: ((some_timestamp > '2015-01-00 00:00:00'::timestamp without time zone) AND (some_id = 1))
Rows Removed by Filter: 5746666
Heap Blocks: exact=395922
-> Bitmap Index Scan on trgm_idx (cost=0.00..345.00 rows=667 width=0) (actual time=1325.867..1325.867 rows=5833670 loops=1)
Index Cond: ((((((((((COALESCE(field1, ''::text) || ' '::text) || COALESCE(field2, ''::text)) || COALESCE(field3, ''::text)) || ' '::text) || COALESCE(field4, ''::text)) || ' '::text) || COALESCE(field5, ''::text)) || ' '::text) || (field6)::text) ~~* '%some_text%'::text)
Planning Time: 0.252 ms
Execution Time: 37243.205 ms
(14 rows)