python - 在python中逐行比较两个不同的文件

Question

我有两个不同的文件，我想逐行比较它们的内容，并将它们的共同内容写入不同的文件中。请注意，它们都包含一些空格。这是我的伪代码：

file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')

for line1 in file1:
    for line2 in file2:
        if line1 == line2:
            FO.write("%s\n" %(line1))

FO.close()
file1.close()
file2.close()

但是，通过这样做，我的FO文件中有很多空格。似乎也写了常见的空格。我只想写文本部分。有人能帮帮我吗。

例如：我的第一个文件（file1）包含数据：

Config:
Hostname = TUVALU

BT:
TS_Ball_Update_Threshold = 0.2

BT:
TS_Player_Search_Radius = 4

BT:
Ball_Template_Update = 0

而第二个文件（file2）包含数据：

Pole_ID      = 2
Width        = 1280
Height       = 1024
Color_Mode   = 0
Sensor_Scale = 1

Tracking_ROI_Size = 4
Ball_Template_Update = 0

如果您注意到，每个文件的最后两行是相同的，因此，我想将此文件写入我的FO文件中。但是，我的方法的问题在于，它也写了公共空格。我应该使用正则表达式来解决这个问题吗？我没有使用正则表达式的经验。

score 93 · Accepted Answer

此解决方案一次读取两个文件，排除空白行，并打印公共行，无论它们在文件中的位置如何：

with open('some_file_1.txt', 'r') as file1:
    with open('some_file_2.txt', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

score 15 · Accepted Answer

还有一个例子……

from __future__ import print_function #Only for Python2

with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
    for line1, line2 in zip(f1, f2):
        if line1 == line2:
            print(line1, end='', file=outfile)

如果您想消除常见的空行，只需将 if 语句更改为：

if line1.strip() and line1 == line2:

.strip()删除所有前导和尾随空格，因此如果这就是一行的全部内容，它将成为一个空字符串""，这被认为是错误的。

score 11 · Accepted Answer

如果您专门寻找两个文件之间的差异，那么这可能会有所帮助：

with open('first_file', 'r') as file1:
    with open('second_file', 'r') as file2:
        difference = set(file1).difference(file2)

difference.discard('\n')

with open('diff.txt', 'w') as file_out:
    for line in difference:
        file_out.write(line)

score 7 · Accepted Answer

如果在文件之间保留顺序，您可能还更喜欢difflib. 尽管 Robᵩ 的结果是交叉路口的真正标准，但您实际上可能正在寻找类似粗略的差异：

from difflib import Differ

with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
    differ = Differ()

    for line in differ.compare(f1.readlines(), f2.readlines()):
        if line.startswith(" "):
            print(line[2:], end="")

也就是说，这与您所要求的行为不同（顺序很重要），即使在这种情况下产生了相同的输出。

score 4 · Accepted Answer

一旦文件对象被迭代，它就会被耗尽。

>>> f = open('1.txt', 'w')
>>> f.write('1\n2\n3\n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1

2

3

# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>

使用file.seek（或关闭/打开文件）倒带文件：

>>> f.seek(0)
>>> for line in f: print line
...
1

2

3

score 1 · Accepted Answer

尝试这个：

from __future__ import with_statement

filename1 = "G:\\test1.TXT"
filename2 = "G:\\test2.TXT"


with open(filename1) as f1:
   with open(filename2) as f2:
      file1list = f1.read().splitlines()
      file2list = f2.read().splitlines()
      list1length = len(file1list)
      list2length = len(file2list)
      if list1length == list2length:
          for index in range(len(file1list)):
              if file1list[index] == file2list[index]:
                  print file1list[index] + "==" + file2list[index]
              else:                  
                  print file1list[index] + "!=" + file2list[index]+" Not-Equel"
      else:
          print "difference inthe size of the file and number of lines"

score 0 · Accepted Answer

我刚刚遇到了同样的挑战，但我想“如果你可以用简单的“grep”解决它，为什么要用 Python 编程呢？这导致了以下 Python 代码：

import subprocess
from subprocess import PIPE

try:
  output1, errors1 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file1.txt", "c:\\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  output2, errors2 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file2.txt", "c:\\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
    print ("Compare result : There are differences:");
    if (len(output1) + len(output2) > 0):
      print ("  Output differences : ");
      print (output1);
      print (output2);
    if (len(errors1) + len(errors2) > 0):
      print (" Errors : ");
      print (errors1);
      print (errors2);
  else:
    print ("Compare result : Both files are equal");
except Exception as ex:
  print("Compare result : Exception during comparison");
  print(ex);
  raise;

这背后的技巧如下： grep -Fvf file1.txt file2.txt验证 file2.txt 中的所有条目是否都存在于 file1.txt 中。通过在两个方向上执行此操作，我们可以查看两个文件的内容是否“相等”。我在引号之间加上了“相等”，因为在这种工作方式中重复的行被忽略了。

显然，这只是一个示例：您可以用grep任何命令行文件比较工具替换。

python - 在python中逐行比较两个不同的文件

7 回答 7

Related

Reference