(lensum - ldist) / lensum
ldist 不是距离,是成本的总和
不匹配的数组的每个数字来自上方、左侧或对角线
如果数字来自左边他是一个插入,它来自上面它是一个删除,它来自对角线它是一个替换
插入和删除的成本为 1,替换的成本为 2。替换成本为 2,因为它是删除和插入
ab ac 成本为 2,因为它是替代品
>>> import Levenshtein as lev
>>> lev.distance("ab","ac")
1
>>> lev.ratio("ab","ac")
0.5
>>> (4.0-1.0)/4.0 #Erro, the distance is 1 but the cost is 2 to be a replacement
0.75
>>> lev.ratio("ab","a")
0.6666666666666666
>>> lev.distance("ab","a")
1
>>> (3.0-1.0)/3.0 #Coincidence, the distance equal to the cost of insertion that is 1
0.6666666666666666
>>> x="ab"
>>> y="ac"
>>> lev.editops(x,y)
[('replace', 1, 1)]
>>> ldist = sum([2 for item in lev.editops(x,y) if item[0] == 'replace'])+ sum([1 for item in lev.editops(x,y) if item[0] != 'replace'])
>>> ldist
2
>>> ln=len(x)+len(y)
>>> ln
4
>>> (4.0-2.0)/4.0
0.5
更多信息:python-Levenshtein 比率计算
另一个例子:
成本为 9(4 替换 => 4*2=8 和 1 删除 1*1=1, 8+1=9)
str1=len("google") #6
str2=len("look-at") #7
str1 + str2 #13
distance = 5 (根据矩阵的向量 (7, 6) = 5)
比率为 (13-9)/13 = 0.3076923076923077
>>> c="look-at"
>>> d="google"
>>> lev.editops(c,d)
[('replace', 0, 0), ('delete', 3, 3), ('replace', 4, 3), ('replace', 5, 4), ('replace', 6, 5)]
>>> lev.ratio(c,d)
0.3076923076923077
>>> lev.distance(c,d)
5