python - 遍历每个文件夹中的 2 个文件并进行比较

Question

我比较两个文本文件并将结果打印到第三个文件。我正在尝试这样做，以便我正在运行的脚本将在脚本的 CWD 中遍历其中包含两个文本文件的所有文件夹。

到目前为止我所拥有的：

import os
import glob

path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
    print('current file is: ' + infile)
    with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:

这是开始迭代过程的好方法吗？

这不是最清晰的代码，但它完成了工作。但是，我很确定我需要从读/写方法中删除逻辑，但我不确定从哪里开始。

我基本上想做的是让脚本遍历其 CWD 中的所有文件夹，打开每个文件夹，比较里面的两个文本文件，将第三个文本文件写入同一个文件夹，然后继续下一个。

我尝试过的另一种方法如下：

import os

rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

这将输出以下内容（为您提供文件结构的更好示例：

Found directory: C:\Python27\test
    test.py
Found directory: C:\Python27\test\asdd
    asd1.txt
    asd2.txt
Found directory: C:\Python27\test\chro
    ch1.txt
    ch2.txt
Found directory: C:\Python27\test\hway
    hw1.txt
    hw2.txt

将比较逻辑放在下面是否明智for fname in fileList？我如何确保它比较特定文件夹中的两个文本文件而不是与其他文件夹fnames中的其他文本文件进行比较fileList？

这是我尝试将此功能添加到其中的完整代码。我为它的科学怪人性质表示歉意，但我仍在开发一个精致的版本，但它还没有工作。

from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os



class avs_auto:

    def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
        self.load(input_file1, input_file2, output_file1, output_file2)
        self.compare(output_file1, output_file2)
        self.final(result_file)

    def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
        with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
            frame_rects = defaultdict(list)
            for row in (map(str, line.split()) for line in fin1):
                id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
                frame_rects[frame].append(id)
                frame_rects[frame].append(rect)
            frame_rects2 = defaultdict(list)
            for row in (map(str, line.split()) for line in fin2):
                id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
                frame_rects2[frame].append(id)
                frame_rects2[frame].append(rect)

        with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
            for frame, rects in sorted(frame_rects.iteritems()):
                fout1.write('{{{}:{}}}\n'.format(frame, rects))
            for frame, rects in sorted(frame_rects2.iteritems()):
                fout2.write('{{{}:{}}}\n'.format(frame, rects))


    def compare(self, fileOut1, fileOut2):
        with open(fileOut1+'.txt', 'r') as fin1:
            with open(fileOut2+'.txt', 'r') as fin2:
                lines1 = fin1.readlines()
                lines2 = fin2.readlines()
                diff_lines = [l.strip() for l in lines1 if l not in lines2]
                diffs = defaultdict(list)
                with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
                    for line in diff_lines:
                        d = eval(line)
                        for k in d:
                            list_ids = d[k]
                            for i in range(0, len(d[k]), 2):
                                diffs[d[k][i]].append(k)
                    for id_ in diffs:
                        diffs[id_].sort()
                        for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
                            group = map(itemgetter(1), g)
                            result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))


    def final(self, result_file):
        with open(result_file+'.txt', 'r') as fin:
            lines = (line.split() for line in fin)
            for k, g in groupby(lines, itemgetter(0)):
                fst = next(g)
                lst = next(iter(deque(g, 1)), fst)
                with open('final/{}.avs'.format(k), 'w') as fout:
                    fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
                    fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
                    fout.write('video0=BilinearResize(video0,640,480)\n')
                    fout.write('video1=BilinearResize(video1,640,480)\n')
                    fout.write('StackHorizontal(video0,video1)\n')
                    fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))

使用该load_and_compare()函数，我定义了两个输入文本文件、两个输出文本文件、一个用于比较结果的文件和一个为所有差异写入许多文件的最后阶段。

我想要做的是让整个类在当前工作目录上运行并遍历每个子文件夹，比较两个文本文件，并将所有内容写入同一个文件夹，特别是final()结果。

score 2 · Accepted Answer

您确实可以使用os.walk()，因为它已经将目录与文件分开了。您只需要它返回的目录，因为这是您要查找 2 个特定文件的地方。

您也可以使用os.listdir()but 它在同一列表中返回目录以及文件，因此您必须自己检查目录。

for subdir in dirnames不管怎样，一旦你有了目录，你就可以遍历它们（

假设还有一些目录没有特定的 2 个文件，最好将open()调用包装在一个try..except块中，从而忽略其中一个文件（或两个文件）不存在的目录。

最后，如果您使用os.walk()，您可以轻松地选择是只想进入一层深的目录还是遍历树的整个深度。在前一种情况下，您只需清除 dirnames 列表dirnames[:] = []。请注意，这dirnames = []是行不通的，因为这只会创建一个新的空列表并将该引用放入变量中，而不是清除旧列表。

print("do something ...")用你的程序逻辑替换。

#!/usr/bin/env python

import errno
import os

f1 = "test1"
f2 = "test2"

path = "."
for dirpath, dirnames, _ in os.walk(path):
    for subdir in dirnames:
        filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
        try:
            with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
                print("do something with " + str(fin1) + " and " + str(fin2))
        except IOError as e:
            # ignore directiories that don't contain the 2 files
            if e.errno != errno.ENOENT:
                # reraise exception if different from "file or directory doesn't exist"
                raise

    # comment the next line out if you want to traverse all subsubdirectories
    dirnames[:] = []

编辑：

根据您的评论，我希望我现在能更好地理解您的问题。

请尝试以下代码片段。整体结构保持不变，只是现在我使用返回的文件名os.walk(). 不幸的是，这也会使执行诸如“仅进入 1 级深度的子目录”之类的事情变得更加困难，因此我希望递归地遍历树对您来说没问题。如果没有，我将不得不在稍后添加一些代码。

#!/usr/bin/env python

import fnmatch
import os

filter_pattern = "*.txt"

path = "."
for dirpath, dirnames, filenames in os.walk(path):
    # comment this out if you don't want to filter
    filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]

    if len(filenames) == 2:
        # comment this out if you don't want the 2 filenames to be sorted
        filenames.sort(key=str.lower)

        filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
        with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
            print("do something with " + str(fin1) + " and " + str(fin2))

我仍然不确定你的程序逻辑是做什么的，所以你必须自己接口这两者。

但是，我注意到您在整个代码中明确地将“.txt”扩展名添加到文件名中，因此根据您将如何使用该片段，您可能需要也可能不需要删除“.txt”在移交文件名之前先扩展名。这可以通过在排序之后或之前插入以下行来实现：

        filenames = [os.path.splitext(fn)[0] for fn in filenames]

另外，我还是不明白你为什么要使用eval(). 文本文件是否包含 python 代码？在任何情况下，eval()都应该避免使用更具体的手头任务的代码来代替。

如果它是逗号分隔的字符串列表，请line.split(",")改用。

如果逗号前后可能有空格，请[word.strip() for word in line.split(",")]改用。

如果它是逗号分隔的整数列表，[int(num) for num in line.split(",")]请改用 - 对于浮点数，它的工作方式类似。

等等

python - 遍历每个文件夹中的 2 个文件并进行比较

1 回答 1

Related

Reference