我可以用 Ruby 测量两个字符串之间的距离吗?
IE:
compare('Test', 'est') # Returns 1
compare('Test', 'Tes') # Returns 1
compare('Test', 'Tast') # Returns 1
compare('Test', 'Taste') # Returns 2
compare('Test', 'tazT') # Returns 5
由于本机 C 绑定,更加容易和快速:
gem install levenshtein-ffi
gem install levenshtein
require 'levenshtein'
Levenshtein.normalized_distance string1, string2, threshold
http://rubygems.org/gems/levenshtein http://rubydoc.info/gems/levenshtein/0.2.2/frames
我为你找到了这个:
def levenshtein_distance(s, t)
m = s.length
n = t.length
return m if n == 0
return n if m == 0
d = Array.new(m+1) {Array.new(n+1)}
(0..m).each {|i| d[i][0] = i}
(0..n).each {|j| d[0][j] = j}
(1..n).each do |j|
(1..m).each do |i|
d[i][j] = if s[i-1] == t[j-1] # adjust index into string
d[i-1][j-1] # no operation required
else
[ d[i-1][j]+1, # deletion
d[i][j-1]+1, # insertion
d[i-1][j-1]+1, # substitution
].min
end
end
end
d[m][n]
end
[ ['fire','water'], ['amazing','horse'], ["bamerindos", "giromba"] ].each do |s,t|
puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}"
end
这是很棒的输出:=)
levenshtein_distance('fire', 'water') = 4
levenshtein_distance('amazing', 'horse') = 7
levenshtein_distance('bamerindos', 'giromba') = 9
Rubygems 中有一个实用程序方法实际上应该是公共的,但无论如何它不是:
require "rubygems/text"
ld = Class.new.extend(Gem::Text).method(:levenshtein_distance)
p ld.call("asd", "sdf") => 2
简单得多,我有时会炫耀 Ruby...
# Levenshtein distance, translated from wikipedia pseudocode by ross
def lev s, t
return t.size if s.empty?
return s.size if t.empty?
return [ (lev s.chop, t) + 1,
(lev s, t.chop) + 1,
(lev s.chop, t.chop) + (s[-1, 1] == t[-1, 1] ? 0 : 1)
].min
end
Ruby 2.3 及更高版本附带did_you_mean
包含DidYouMean::Levenshtein.distance
. 适合大多数情况,默认情况下可用。
DidYouMean::Levenshtein.distance("Test", "est") # => 1
我制作了一个damerau-levenshtein gem,其中算法在 C 中实现
require "damerau-levenshtein"
dl = DamerauLevenshtein
dl.distance("Something", "Smoething") #returns 1
我喜欢上面的 DigitalRoss 解决方案。但是,正如 dawg 所指出的,它的运行时间按顺序增长O(3^n)
,这对于较长的字符串没有好处。使用记忆化或“动态编程”可以显着加快该解决方案的速度:
def lev(string1, string2, memo={})
return memo[[string1, string2]] if memo[[string1, string2]]
return string2.size if string1.empty?
return string1.size if string2.empty?
min = [ lev(string1.chop, string2, memo) + 1,
lev(string1, string2.chop, memo) + 1,
lev(string1.chop, string2.chop, memo) + (string1[-1] == string2[-1] ? 0 : 1)
].min
memo[[string1, string2]] = min
min
end
然后我们有更好的运行时间,(我认为它几乎是线性的?我不太确定)。
[9] pry(main)> require 'benchmark'
=> true
[10] pry(main)> @memo = {}
=> {}
[11] pry(main)> Benchmark.realtime{puts lev("Hello darkness my old friend", "I've come to talk with you again")}
26
=> 0.007071999832987785