我有两个文件,每个文件都有数万行,output1.txt 和 output2.txt。我想遍历这两个文件并返回两者之间不同的行(和内容)。它们大多相同,这就是我找不到差异的原因(filecmp.cmp 返回 false)。
			
			23746 次
		
3 回答
            4        
        
		
7.4. difflib — 计算增量的助手
2.1 版中的新功能。
该模块提供用于比较序列的类和函数。例如,它可以用于比较文件,并且可以生成各种格式的差异信息,包括 HTML 和上下文以及统一差异。要比较目录和文件,另请参阅 filecmp 模块。
于 2013-07-23T00:39:31.093   回答
    
    
            2        
        
		
只要您不关心订单,您就可以使用:
with open('file1') as f:
    t1 = f.read().splitlines()
    t1s = set(t1)
with open('file2') as f:
    t2 = f.read().splitlines()
    t2s = set(t2)
#in file1 but not file2
print "Only in file1"
for diff in t1s-t2s:
    print t1.index(diff), diff
#in file2 but not file1
print "Only in file2"
for diff in t2s-t1s:
    print t2.index(diff), diff
编辑:如果您确实关心订单并且它们几乎相同,那么为什么不使用命令diff呢?
于 2013-07-23T00:40:25.410   回答
    
    
            1        
        
		
你可以这样做:
import difflib, sys
tl=100000    # large number of lines
# create two test files (Unix directories...)
with open('/tmp/f1.txt','w') as f:
    for x in range(tl):
        f.write('line {}\n'.format(x))
with open('/tmp/f2.txt','w') as f:
    for x in range(tl+10):   # add 10 lines
        if x in (500,505,1000,tl-2):
            continue         # skip these lines
        f.write('line {}\n'.format(x))        
with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())    
    for line in diff:
        if line.startswith('-'):
            sys.stdout.write(line)
        elif line.startswith('+'):
            sys.stdout.write('\t\t'+line)   
打印(400 毫秒):
- line 500
- line 505
- line 1000
- line 99998
        + line 100000
        + line 100001
        + line 100002
        + line 100003
        + line 100004
        + line 100005
        + line 100006
        + line 100007
        + line 100008
        + line 100009
如果您想要行号,请使用枚举:
with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())    
    for i,line in enumerate(diff):
        if line.startswith(' '):
            continue
        sys.stdout.write('My count: {}, text: {}'.format(i,line))  
    于 2013-07-23T03:50:14.103   回答