python - 如何在 Python 中连接文本文件？

Question

我有 20 个文件名的列表，例如['file1.txt', 'file2.txt', ...]. 我想编写一个 Python 脚本来将这些文件连接成一个新文件。我可以通过调用打开每个文件f = open(...)，通过调用逐行读取f.readline()，然后将每一行写入该新文件。对我来说，这似乎不是很“优雅”，尤其是我必须逐行读/写的部分。

在 Python 中是否有更“优雅”的方式来做到这一点？

score 301 · Accepted Answer

这应该这样做

对于大文件：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

对于小文件：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

……还有一个我想到的有趣的：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

遗憾的是，最后一种方法留下了一些打开的文件描述符，GC 无论如何都应该处理这些描述符。我只是觉得这很有趣

score 236 · Accepted Answer

使用shutil.copyfileobj.

它会自动为您逐块读取输入文件，这样效率更高，并且即使某些输入文件太大而无法放入内存，它也可以读取输入文件并且可以正常工作：

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

score 65 · Accepted Answer

这正是fileinput的用途：

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

对于这个用例，它实际上并不比手动迭代文件简单得多，但在其他情况下，使用一个迭代器来迭代所有文件，就好像它们是一个文件一样非常方便。（此外，fileinput一旦完成就关闭每个文件的事实意味着不需要with或close每个文件，但这只是节省了一行，没什么大不了的。）

中还有一些其他漂亮的功能fileinput，例如仅通过过滤每一行来对文件进行就地修改的能力。

如评论中所述，并在另一篇文章中讨论过， Python fileinput2.7 将无法正常工作。此处稍作修改以使代码符合 Python 2.7

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

score 8 · Accepted Answer

我不知道优雅，但这有效：

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

score 6 · Accepted Answer

UNIX 命令有什么问题？（假设您不在 Windows 上工作）：

ls | xargs cat | tee output.txt完成这项工作（如果需要，您可以使用子进程从 python 调用它）

score 5 · Accepted Answer

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

一个简单的基准测试表明shutil 表现更好。

score 3 · Accepted Answer

@inspectorG4dget 答案的替代方法（截至 2016 年 3 月 29 日的最佳答案）。我测试了 3 个 436MB 的文件。

@inspectorG4dget 解决方案：162 秒

以下解决方案：125 秒

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

这个想法是利用“旧的好技术”创建一个批处理文件并执行它。它的半蟒蛇，但工作得更快。适用于窗户。

score 3 · Accepted Answer

如果目录中有很多文件，那么glob2生成文件名列表而不是手动编写它们可能是一个更好的选择。

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

score 2 · Accepted Answer

查看 File 对象的 .read() 方法：

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

您可以执行以下操作：

concat = ""
for file in files:
    concat += open(file).read()

或更“优雅”的python方式：

concat = ''.join([open(f).read() for f in files])

根据这篇文章：http ://www.skymind.com/~ocrow/python_string/也是最快的。

score 2 · Accepted Answer

如果文件不是很大：

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

如果文件太大而无法完全读取并保存在 RAM 中，则算法必须稍有不同，以通过固定长度的块读取要在循环中复制的每个文件，read(10000)例如使用。

score 0 · Accepted Answer

def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()

score -2 · Accepted Answer

  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)

python - 如何在 Python 中连接文本文件？

12 回答 12

Related

Reference