下面接受两个字符串,比较差异并将它们作为相同以及它们的差异返回,用空格分隔(保持最长字符串的长度。
代码中的注释区域是应该返回的 4 个字符串。
from difflib import SequenceMatcher
t1 = 'betty: backstreetvboysareback"give.jpg"LAlarrygarryhannyhref="ang"_self'
t2 = 'bettyv: backstreetvboysareback"lifeislike"LAlarrygarryhannyhref="in.php"_self'
#t1 = 'betty : backstreetvboysareback" i e "LAlarrygarryhannyhref=" n "_self'
#t2 = 'betty : backstreetvboysareback" i e "LAlarrygarryhannyhref=" n "_self'
#o1 = ' g v .jpg g '
#o2 = ' v l f islike i .php '
matcher = SequenceMatcher(None, t1, t2)
blocks = matcher.get_matching_blocks()
bla1 = []
bla2 = []
for i in range(len(blocks)):
if i != len(blocks)-1:
bla1.append([t1[blocks[i].a + blocks[i].size:blocks[i+1].a], blocks[i].a + blocks[i].size, blocks[i+1].a])
bla2.append([t2[blocks[i].b + blocks[i].size:blocks[i+1].b], blocks[i].b + blocks[i].size, blocks[i+1].b])
cnt = 0
for i in range(len(bla1)):
if bla1[i][1] < bla2[i][1]:
num = bla2[i][1] - bla1[i][1]
t2 = t2[0:bla2[i][1]] + ' '*num + t2[bla2[i][1]:len(t2)]
bla2[i][0] = ' '*num + bla2[i][0]
bla2[i][1] = bla1[i][1]
if bla2[i][1] < bla1[i][1]:
num = bla1[i][1] - bla2[i][1]
t1 = t1[0:bla1[i][1]] + ' '*num + t1[bla1[i][1]:len(t1)]
bla1[i][0] = ' '*num + bla1[i][0]
bla1[i][1] = bla2[i][1]
if bla1[i][2] > bla2[i][2]:
num = bla1[i][2] - bla2[i][2]
t2 = t2[0:bla2[i][2]] + ' '*num + t2[bla2[i][2]:len(t2)]
bla2[i][0] = bla2[i][0] + ' '*num
bla2[i][2] = bla1[i][2]
if bla2[i][2] > bla1[i][2]:
num = bla2[i][2] - bla1[i][2]
t1 = t1[0:bla1[i][2]] + ' '*num + t1[bla1[i][2]:len(t1)]
bla1[i][0] = bla1[i][0] + ' '*num
bla1[i][2] = bla2[i][2]
t11 = []
t11 = t1[0:bla1[0][1]]
t11 += t1[bla1[0][2]:bla1[1][1]]
t11 += t1[bla1[1][2]:bla1[2][1]]
t11 += t1[bla1[2][2]:bla1[3][1]]
t11 += t1[bla1[3][2]:bla1[4][1]]
t11 += t1[bla1[5][2]:bla1[6][1]]
t11 += t1[bla1[6][2]:len(t1)]
t12 = []
t12 = t2[0:bla1[0][1]]
t12 += t2[bla1[0][2]:bla1[1][1]]
t12 += t2[bla1[1][2]:bla1[2][1]]
t12 += t2[bla1[2][2]:bla1[3][1]]
t12 += t2[bla1[3][2]:bla1[4][1]]
t12 += t2[bla1[5][2]:bla1[6][1]]
t12 += t2[bla1[6][2]:len(t2)]
在将块排列成有组织的格式bla1
之后,bla2
每个差异都存储为一个字符串,其中包含其开始和结束位置,例如['v', 33, 34]
每个单独的字符串。在此之后,我尝试插入空格以匹配必要的长度和分隔因子,这就是代码开始中断的地方。
如果有人可以看看,请!