I am trying to find a way to build a fuzzy search where both the text database and the queries may have spelling variants. In particular, the text database is material collected from the web and likely would not benefit from full text engine's prep phase (word stemming) I could imagine using pg_trgm as a starting point and then validate hits by Levenshtein. However, people tend to do prefix queries E.g, in the realm of music, I would expect "beetho symphony" to be a reasonable search term. So, is someone were typing "betho symphony", is there a reasonable way (using postgresql with perhaps tcl or perl scripting) to discover that the "betho" part should be compared with "beetho" (returning an edit distance of 1)
288 次
2 回答
示例结果:查询“fantas”,数据库中的单词“fantasy”,“fantastic” => 0 查询“fantas”,数据库中的单词“fan” => 3
于 2013-04-22T16:23:49.353 回答
例如:对于每个 i&j,Match(i,j) = 1/max(i,j)^2 而不是 Match(i,j)=1。(i 和 j 是您要比较的符号的位置)。
它的作用是:dist('ABCD', 'ABCE') < dist('ABCD', 'EBCD')。
于 2013-04-16T17:45:46.477 回答