我来这里是为了寻找差异算法,然后我自己实现了。抱歉,我不知道 vcdiff。
Wikipedia:从最长的公共子序列中,获得类似差异的输出只是一小步:如果子序列中不存在项目但原始项目中存在,则它必须已被删除。(下面的“-”标记。)如果它在子序列中不存在但出现在第二个序列中,则它必须已添加。(“+”标记。)
LCS 算法的漂亮动画在这里。
在此处链接到快速LCS ruby 实现。
我的缓慢而简单的红宝石改编如下。
def lcs(xs, ys)
if xs.count > 0 and ys.count > 0
xe, *xb = xs
ye, *yb = ys
if xe == ye
return [xe] + lcs(xb, yb)
end
a = lcs(xs, yb)
b = lcs(xb, ys)
return (a.length > b.length) ? a : b
end
return []
end
def find_diffs(original, modified, subsequence)
result = []
while subsequence.length > 0
sfirst, *subsequence = subsequence
while modified.length > 0
mfirst, *modified = modified
break if mfirst == sfirst
result << "+#{mfirst}"
end
while original.length > 0
ofirst, *original = original
break if ofirst == sfirst
result << "-#{ofirst}"
end
result << "#{sfirst}"
end
while modified.length > 0
mfirst, *modified = modified
result << "+#{mfirst}"
end
while original.length > 0
ofirst, *original = original
result << "-#{ofirst}"
end
return result
end
def pretty_diff(original, modified)
subsequence = lcs(modified, original)
diffs = find_diffs(original, modified, subsequence)
puts 'ORIG [' + original.join(', ') + ']'
puts 'MODIFIED [' + modified.join(', ') + ']'
puts 'LCS [' + subsequence.join(', ') + ']'
puts 'DIFFS [' + diffs.join(', ') + ']'
end
pretty_diff("human".scan(/./), "chimpanzee".scan(/./))
# ORIG [h, u, m, a, n]
# MODIFIED [c, h, i, m, p, a, n, z, e, e]
# LCS [h, m, a, n]
# DIFFS [+c, h, +i, -u, m, +p, a, n, +z, +e, +e]