你当然可以这样做,但我怀疑它会非常有用:
select *,levenshtein(lexeme,'color') from things, unnest(to_tsvector('english',description))
order by levenshtein;
id | description | lexeme | positions | weights | levenshtein
----+--------------------+--------+-----------+---------+-------------
3 | Painting colors | color | {2} | {D} | 0
1 | A red coloured car | colour | {3} | {D} | 1
1 | A red coloured car | car | {4} | {D} | 3
1 | A red coloured car | red | {2} | {D} | 5
3 | Painting colors | paint | {1} | {D} | 5
2 | The garden | garden | {2} | {D} | 6
大概您希望修饰查询以应用一些截止,可能截止取决于长度,并且假设它满足该截止,则仅返回每个描述的最佳结果。这样做应该只是例行的 SQL 操作。
最近添加到pg_trgm
.
select *, description <->> 'color' as distance from things order by description <->> 'color';
id | description | distance
----+--------------------+----------
3 | Painting colors | 0.166667
1 | A red coloured car | 0.333333
2 | The garden | 1
另一种选择是找到一个标准化英式/美式拼写的词干分析器或词库(我不知道有一个现成的),然后根本不使用模糊匹配。我认为这将是最好的,如果你能做到的话。