我修改了这篇文章中的 Levenshtein 距离 VBA 函数以使用一维数组。它执行得更快。
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance2(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long, ssL As Long, cost As Long 'loop counters, loop step, loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If Mid$(s1, i Mod ssL, 1) <> Mid$(s2, j, 1) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance2 = D(LD)
End Function
我已经用长度为 11,304 的字符串 's1' 和长度为 5,665 的字符串 's2' 测试了这个函数(> 6400 万个字符比较)。使用上述一维版本的函数,在我的机器上执行时间约为 24 秒。我在上面的链接中引用的原始二维函数对于相同的字符串需要大约 37 秒。我已经进一步优化了一维函数,如下所示,相同的字符串需要大约 10 秒。
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long 'loop counters, loop step
Dim ssL As Long, cost As Long 'loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
Dim sss1() As String, sss2() As String 'Character arrays for string S1 & S2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
ReDim sss1(1 To L1) 'Size character array S1
ReDim sss2(1 To L2) 'Size character array S2
For i = 1 To L1 Step 1: sss1(i) = Mid$(s1, i, 1): Next i 'Fill S1 character array
For i = 1 To L2 Step 1: sss2(i) = Mid$(s2, i, 1): Next i 'Fill S2 character array
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If sss1(i Mod ssL) <> sss2(j) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance = D(LD)
End Function