你可以尝试这样的事情:
list1=[[['B', 10], 1], [['C', 15], 1], [['F', 30], 1]]
list2=[[['G', 20], 2], [['D', 25], 1]]
list2_iter = iter(list2)
for item1 in list1:
in_list2 = False
for item2 in list2_iter:
if item1[1] == item2[1]:
print "Match" if item1 == item2 else "Modified", item1, item2
in_list2 = True
break
else:
print "Inserted", item2
if not in_list2:
print "Deleted", item1
请注意,内部循环 forlist2
使用的是迭代器,因此它不是从list2
每次的开头开始,而是从它在外部循环的上一次迭代中停止的最后一个元素开始。其他一切都相当简单。
输出:
Inserted [['G', 20], 2]
Modified [['B', 10], 1] [['D', 25], 1]
Deleted [['C', 15], 1]
Deleted [['F', 30], 1]
请注意,这并不总是能找到最佳匹配,因为它只会通过两者,list1
和list2
。如需更完整的算法,请查看Diff and Longest Common Subsequence。
更新:我刚刚意识到这个问题实际上与找到两个字符串的最小编辑距离几乎相同,为此我碰巧有一些代码。经过一些概括:
import operator
def diff(s1, s2, match=operator.eq, neutral="*", subst=2):
"""s1, s2: the two strings or lists to match
match: function to determine whether two elements match up
neutral: 'neutral' element for padding
subst: substitution costs
"""
s1, s2 = neutral + s1, neutral + s2
# calculate edit distance / sequence match with DP
A = [[0] * len(s2) for i in range(len(s1))]
for i in range(len(s1)):
for k in range(len(s2)):
if min(i, k) == 0:
A[i][k] = max(i, k)
else:
diag = 0 if match(s1[i], s2[k]) else subst
A[i][k] = min(A[i-1][k-1] + diag,
A[i ][k-1] + 1,
A[i-1][k ] + 1)
# reconstruct path
path, i, k = [], len(s1)-1, len(s2)-1
while i or k:
if A[i][k] == A[i-1][k] + 1:
path, i = [-1] + path, i-1
elif A[i][k] == A[i][k-1] + 1:
path, k = [1] + path, k-1
else:
path, i, k = [0] + path, i-1, k-1
return A[len(s1)-1][len(s2)-1], path
def print_match(list1, list2, path):
i1, i2 = iter(list1), iter(list2)
for p in path:
if p == -1: print "DEL %20r" % next(i1)
if p == 0: print "EQ %20r %20r" % (next(i1), next(i2))
if p == +1: print "INS %41r" % next(i2)
# with strings
word1, word2 = "INTENTION", "EXECUTION"
x, path = diff(word1, word2)
print_match(word1, word2, path)
# with your lists of lists
list1 = [[['B', 10], 1], [['C', 15], 1], [['F', 30], 1]]
list2 = [[['G', 20], 2], [['D', 25], 1]]
x, path = diff(list1, list2, match=lambda x, y: x[1] == y[1], neutral=[[None, -1]])
print_match(list1, list2, path)
请注意,生成的匹配仍然可能与您的预期不同,因为通常有许多相同的最佳方式来匹配元素,但它仍然应该是最佳的。