我想为以下用例在文本列上创建一个索引。我们有一个带有文本类型Segment
列的表。content
我们使用 pg_trgm 执行基于相似性的查询。这在翻译编辑器中用于查找相似的字符串。以下是表格详情:
CREATE TABLE public.segments
(
id integer NOT NULL DEFAULT nextval('segments_id_seq'::regclass),
language_id integer NOT NULL,
content text NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT segments_pkey PRIMARY KEY (id),
CONSTRAINT segments_language_id_fkey FOREIGN KEY (language_id)
REFERENCES public.languages (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT segments_content_language_id_key UNIQUE (content, language_id)
)
这是查询(Ruby + Hanami):
def find_by_segment_match(source_text_for_lookup, source_lang, sim_score)
aggregate(:translation_records)
.where(language_id: source_lang)
.where { similarity(:content, source_text_for_lookup) > sim_score/100.00 }
.select_append { float::similarity(:content, source_text_for_lookup).as(:similarity) }
.order { similarity(:content, source_text_for_lookup).desc }
end
- -编辑 - -
这是查询:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity" FROM "segments" WHERE (("language_id" = 2) AND (similarity("content", 'This will not work.') > 0.45)) ORDER BY SIMILARITY("content", 'This will not work.') DESC
SELECT "translation_records"."id", "translation_records"."source_segment_id", "translation_records"."target_segment_id", "translation_records"."domain_id",
"translation_records"."style_id",
"translation_records"."created_by", "translation_records"."updated_by", "translation_records"."project_name", "translation_records"."created_at", "translation_records"."updated_at", "translation_records"."language_combination", "translation_records"."uid",
"translation_records"."import_comment" FROM "translation_records" INNER JOIN "segments" ON ("segments"."id" = "translation_records"."source_segment_id") WHERE ("translation_records"."source_segment_id" IN (27548)) ORDER BY "translation_records"."id"
---结束编辑---
---编辑1---
重新索引呢?最初,我们将导入大约 200 万条旧记录。如果有的话,我们应该何时以及多久重建一次索引?
---结束编辑1---
像 CREATE INDEX ON Segment USING gist (content) 这样的东西可以吗?我真的找不到哪个可用索引最适合我们的用例。
最好的,塞巴