edit-distance - 单词之间的详细距离

Question

我将如何显示单词之间的详细距离。例如，程序的输出可能是：

Words are "car" and "cure":
Replace "a" with "u".
Add "e".

Levenshtein 距离不能满足我的需求（我认为）。

score 1 · Accepted Answer

试试下面的。该算法大致遵循维基百科（Levenshtein distance）。下面使用的语言是ruby

以为例，s改成的情况t如下：

s = 'Sunday'
t = 'Saturday'

首先，sandt变成数组，并在开头插入一个空字符串。m最终将是算法中使用的矩阵。

s = ['', *s.split('')]
t = ['', *t.split('')]
m = Array.new(s.length){[]}

m然而，这里与给出的矩阵不同，如果维基百科中的算法因为每个单元格不仅包括Levenshtein 距离，而且还包括（非）操作（开始、什么都不做、删除、插入或替换），用于从相邻（左、上或左上）单元格到达该单元格。它还可能包括一个描述操作参数的字符串。即每个单元格的格式为：

[Levenshtein 距离，操作（，字符串）]

这是主要的例程。它填写m以下算法的单元格：

s.each_with_index{|a, i| t.each_with_index{|b, j|
    m[i][j] =
    if i.zero?
        [j, "started"]
    elsif j.zero?
        [i, "started"]
    elsif a == b
        [m[i-1][j-1][0], "did nothing"]
    else
        del, ins, subs = m[i-1][j][0], m[i][j-1][0], m[i-1][j-1][0]
        case [del, ins, subs].min
        when del
            [del+1, "deleted", "'#{a}' at position #{i-1}"]
        when ins
            [ins+1, "inserted", "'#{b}' at position #{j-1}"]
        when subs
            [subs+1, "substituted", "'#{a}' at position #{i-1} with '#{b}'"]
        end
    end
}}

现在，我们将i,设置j为的右下角m并按照步骤向后，将单元格的内容取消移动到一个名为的数组steps中，直到到达起点。

i, j = s.length-1, t.length-1
steps = []
loop do
    case m[i][j][1]
    when "started"
        break
    when "did nothing", "substituted"
        steps.unshift(m[i-=1][j-=1])
    when "deleted"
        steps.unshift(m[i-=1][j])
    when "inserted"
        steps.unshift(m[i][j-=1])
    end
end

然后我们打印每个步骤的操作和字符串，除非那是一个非操作。

steps.each do |d, op, str=''|
    puts "#{op} #{str}" unless op == "did nothing" or op == "started"
end

对于这个特定的示例，它将输出：

inserted 'a' at position 1
inserted 't' at position 2
substituted 'n' at position 2 with 'r'

score 0 · Accepted Answer

class Solution:
   def solve(self, text, word0, word1):
      word_list = text.split()
      ans = len(word_list)
      L = None
      for R in range(len(word_list)):
         if word_list[R] == word0 or word_list[R] == word1:
            if L is not None and word_list[R] != word_list[L]:
               ans = min(ans, R - L - 1)
               L = R
      return -1 if ans == len(word_list) else ans
ob = Solution()
text = "cat dog abcd dog cat cat abcd dog wxyz"
word0 = "abcd"
word1 = "wxyz"
print(ob.solve(text, word0, word1))

edit-distance - 单词之间的详细距离

2 回答 2

Related

Reference