postgresql - 如何在 postgres 中检索两个 tsvector 的差异？

Question

我有两个 varchars 字段，我想获得一个单词数组，它们存在于其中一个中，而不存在于另一个中，即：

old_text := to_tsvector("The quick brown fox jumps over the lazy dog")
new_text := to_tsvector("The slow brown fox jumps over the quick dog at Friday")
-> new words: ARRAY["slow", "at", "Friday"] ( the order of words doesn't matter )

我试着摆弄 ts_vectors，但没有运气.. postgres 中的任何其他功能支持这样的东西吗？

score 1 · Accepted Answer

如果您真的想涉及文本搜索，请查看ts_parse().

SELECT token
FROM ts_parse('default', 'The slow brown fox jumps over the quick dog at Friday')
WHERE tokid != 12 -- blank
EXCEPT
SELECT token
FROM ts_parse('default', 'The quick brown fox jumps over the lazy dog')
WHERE tokid != 12 -- blank

-- will give you

"token"
--------
'slow'
'at'
'Friday'

或者，您可以为此使用正则表达式：

SELECT *
FROM regexp_split_to_table('The slow brown fox jumps over the quick dog at Friday', '\s+')
EXCEPT
SELECT *
FROM regexp_split_to_table('The quick brown fox jumps over the lazy dog', '\s+')

最后，如有必要，使用array_agg()将结果累积到一个数组中。

postgresql - 如何在 postgres 中检索两个 tsvector 的差异？

1 回答 1

Related

Reference