当我尝试以difflib
您的方式使用时,我遇到了同样的问题,因为对于大文件difflib
,将整个文件缓冲在内存中,然后进行比较。作为解决方案,您可以部分比较两个文件。在这里,我为每 100 行执行此操作。
import difflib
file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')
lines_file1 = []
lines_file2 = []
# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
# check if it is in line 100
if not (i % 100 == 0):
lines_file1.append(line[0])
lines_file2.append(line[1])
else:
# show the different for 100 line
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
lines_file1 = []
lines_file2 = []
# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()
希望能帮助到你。