0

I'm tring to do auto-correct for spelling and using Jaro-Winkler strategy . I have list of suggestions and the types word is ranked with the suggestion words. The problem I'm facing, when word "ans"/"anf"/"anr" is types ,"an" is given the heights rank when compared. "and" is way back in the score list . Therefore "ans"/"anf"/"anr" are replaced with "an" instead of "and" .

Any suggestion how should I solve this, or are there any other algorithm to replace "ans"/"anf"/"anr" perfectly with "and" not "an" ?

4

1 回答 1

0

对于一般的拼写错误,加权换位高于删除/添加似乎是个好主意。

假设您的条目是使用标准键盘布局(qwerty?)输入的,您可以根据键之间的物理距离来增加权重。不确定在逻辑上做到这一点的最佳方法。在我的脑海中,您可以创建一个包含键盘映射的二维数组,并比较实际(毕达哥拉斯)距离。

给定一个具有“Q”=[0][0]、“W”=[0][1]、“A”=[1][0] 的映射,A->Q 之间的距离将为 1,Q- >W = 1,Q->S = sqrt(2)。这应该给你一些重量距离。

距离计算可能有一个更清晰的实现,但只是在这里吐口水。

于 2012-07-02T17:30:08.047 回答