44

我有 2 个名为“hosts”的文件(在不同的目录中)

我想用 python 比较它们,看看它们是否相同。如果它们不相同,我想在屏幕上打印差异。

到目前为止,我已经尝试过了

hosts0 = open(dst1 + "/hosts","r") 
hosts1 = open(dst2 + "/hosts","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

但是当我运行这个时,我得到

File "./audit.py", line 34, in <module>
  if lines2 != lines1[i]:
IndexError: list index out of range

这意味着其中一台主机的线路比另一台多。有没有更好的方法来比较 2 个文件并报告差异?

4

5 回答 5

80
import difflib

lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()

lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()

# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
    print line

输出以下内容:

--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
 dog
-cat
 bird
 buffalo
-gophers
-hound
+gopher
 horse
+mouse

此差异为您提供上下文 - 周围的行以帮助清楚文件的不同之处。您可以在此处看到两次“猫”,因为它是从“狗”下方删除并添加到其上方的。

您可以使用 n=0 删除上下文。

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    print line

输出这个:

--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse

但现在它充满了“@@”行,告诉您文件中已更改的位置。让我们删除多余的行以使其更具可读性。

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    for prefix in ('---', '+++', '@@'):
        if line.startswith(prefix):
            break
    else:
        print line

给我们这个输出:

+cat
-cat
-gophers
-hound
+gopher
+mouse

现在你想让它做什么?如果您忽略所有已删除的行,那么您将看不到“猎犬”已被删除。如果您很高兴只显示文件的添加内容,那么您可以这样做:

diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']

print 'additions:'
for line in added:
    print line
print
print 'additions, ignoring position'
for line in added:
    if line not in removed:
        print line

输出:

additions:
cat
gopher
mouse

additions, ignoring position:
gopher
mouse

您现在可能已经知道有多种方法可以“打印两个文件的差异”,因此如果您需要更多帮助,您需要非常具体。

于 2013-10-02T00:14:21.070 回答
11

difflib 库对此很有用,它位于标准库中。我喜欢统一的差异格式。

http://docs.python.org/2/library/difflib.html#difflib.unified_diff

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
        )
        for line in diff:
            sys.stdout.write(line)

输出:

--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
 one
 two
-dogs
 three

这是一个忽略某些行的狡猾版本。可能存在不起作用的边缘情况,并且肯定有更好的方法来做到这一点,但也许它足以满足您的目的。

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
            n=0,
        )
        for line in diff:
            for prefix in ('---', '+++', '@@'):
                if line.startswith(prefix):
                    break
            else:
                sys.stdout.write(line[1:])
于 2013-10-01T16:12:56.303 回答
2
hosts0 = open("C:path\\a.txt","r")
hosts1 = open("C:path\\b.txt","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

上面的代码对我有用。您能否指出您面临的错误?

于 2013-10-01T16:01:48.077 回答
1

您可以添加条件语句。如果您的数组超出索引,则中断并打印文件的其余部分。

于 2013-10-01T18:49:54.027 回答
1
import difflib
f=open('a.txt','r')  #open a file
f1=open('b.txt','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str2,str1))
print '\n'.join(diff)
于 2015-09-17T11:17:11.740 回答