postgresql - 如何在 postgres 中结合全文搜索和三元组

Question

我正在为 git 提交数据库开发搜索系统。我目前正在使用全文搜索使用户能够按作者、提交日期、日志消息和提交哈希进行搜索。目前，提交哈希仅在用户提供整个提交哈希时才有用，这很长且难以记住，但对于指定单个提交很有用。

查询数据库的查询本质上是这样的：

SELECT
    cid,
    (ts_rank(tsv, q) + ts_rank_cd(tsv, q)) AS rank
FROM
    search,
    plainto_tsquery(%(query)s) AS q
WHERE
    (tsv @@ q);

其中 cid 是提交哈希，tsv 是每个提交的相关信息的文本搜索向量。

我的目标是允许用户在他们的查询中只提供一部分提交哈希，并提供基本上来自他们输入的所有提交。

我研究了看起来最有前途的三元组，但我不完全确定如何将它们集成到这个查询中。

score 1 · Accepted Answer

1：创建 tsvector 的列/视图/物化视图。

CREATE MATERIALIZED VIEW unique_lexeme AS
SELECT word FROM ts_stat(
'SELECT to_tsvector('simple', post.title) || 
    to_tsvector('simple', post.content) ||
    to_tsvector('simple', author.name) ||
    to_tsvector('simple', coalesce(string_agg(tag.name, ' ')))
FROM post
JOIN author ON author.id = post.author_id
JOIN posts_tags ON posts_tags.post_id = posts_tags.tag_id
JOIN tag ON tag.id = posts_tags.tag_id
GROUP BY post.id, author.id');

2：使用三元组从该列中选择

SELECT word
FROM unique_lexeme
WHERE similarity(word, 'samething') > 0.5 
ORDER BY word <-> 'samething';

（在本站搜索：拼写错误 http://rachbelaid.com/postgres-full-text-search-is-good-enough/）

3：当你找到单词时，用它们对结果进行排名。使用子查询：

SELECT word WHEREsimilarity(word, 'samething') > 0.5 ORDER BY word <-> 'samething';

或者，您可以只创建一个子查询来检查相似性。

补充：

索引 tsvector 列。

同时刷新物化视图（http://www.postgresqltutorial.com/postgresql-materialized-views/）。

使用触发器更新列（https://www.postgresql.org/docs/9.0/textsearch-features.html）

postgresql - 如何在 postgres 中结合全文搜索和三元组

1 回答 1

Related

Reference