postgresql - 在 PostgreSQL 9.4 中按 ts_vector 中出现的次数查询词位

Question

是否可以使用 WHERE 语句根据词位在 ts_vector 中出现的次数来查询 PostgreSQL？

例如，如果您创建一个带有短语“top hat on the cat”的 ts_vector，您可以SELECT * FROM table WHERE ts_vector @@ {the lexeme 'top' appears twice}吗？

score 1 · Accepted Answer

您可以使用此功能：

create or replace function number_of_occurrences(vector tsvector, token text)
returns integer language sql stable as $$
    select coalesce((
        select length(elem)- length(replace(elem, ',', ''))+ 1
        from unnest(string_to_array(vector::text, ' ')) elem
        where trim(elem, '''') like token || '%'), 0)
$$;

select number_of_occurrences(to_tsvector('top hat on top of the cat'), 'top');

 number_of_occurrences 
-----------------------
                     2
(1 row)

当然，只有当向量包含带位置的词位时，该函数才能正常工作。

select to_tsvector('top hat on top of the cat');

                   to_tsvector                   
-------------------------------------------------
 'cat':7 'hat':2 'of':5 'on':3 'the':6 'top':1,4
(1 row)

使用函数的例子：

SELECT * 
FROM a_table 
WHERE ts_vector @@ to_tsquery('top')
AND number_of_occurrences(ts_vector, 'top') = 2;

score 0 · Accepted Answer

为此，您可以使用unnest和的组合array_length

SELECT *
FROM table
WHERE (
  SELECT array_length(positions, 1)
  FROM unnest(ts_vector)
  WHERE lexeme = 'top'
) = 2

我不认为这将能够使用 GIN 索引，ts_vector但这可能会比在接受的答案函数中执行的字符串操作更快。

postgresql - 在 PostgreSQL 9.4 中按 ts_vector 中出现的次数查询词位

2 回答 2

Related

Reference