0

我有以下代码来比较两个文件。如果我将它们指向大小为 4 或 5 MB 的文件,我希望该程序运行。当我这样做时,python 控制台中的提示光标只是闪烁,并且不显示任何输出。有一次,我跑了一夜,第二天早上它还在闪烁。我可以在此代码中更改什么?

import difflib

file1 = open('/home/michel/Documents/first.csv', 'r')
file2 = open('/home/michel/Documents/second.csv', 'r')

diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(diff)
print delta
4

2 回答 2

0

如果您使用基于 linux 的系统,您可以调用外部命令 diff 并使用它的结果。我用 diff 命令尝试了两个文件 14M 和 9.3M。需要 1.3 秒。

real    0m1.295s
user    0m0.056s
sys     0m0.192s
于 2014-11-14T17:32:22.460 回答
0

当我尝试以difflib您的方式使用时,我遇到了同样的问题,因为对于大文件difflib,将整个文件缓冲在内存中,然后进行比较。作为解决方案,您可以部分比较两个文件。在这里,我为每 100 行执行此操作。

import difflib

file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')

lines_file1 = []
lines_file2 = []

# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
    # check if it is in line 100
    if not (i % 100 == 0):
        lines_file1.append(line[0])
        lines_file2.append(line[1])
    else:
        # show the different for 100 line
        diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
        print ''.join(list(diff))
        lines_file1 = []
        lines_file2 = []

# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()

希望能帮助到你。

于 2014-11-15T15:10:19.453 回答