python - 在 Windows 上使用 Python 连接 gzip 文件

Question

有没有一种节省内存的方法可以在 Windows 上使用 Python 连接 gzip 压缩的文件，而无需解压缩它们？

根据对此答案的评论，它应该很简单：

cat file1.gz file2.gz file3.gz > allfiles.gz

但是我如何在 Windows 上用 Python 做到这一点？

score 8 · Accepted Answer

只需继续写入同一个文件。

with open(..., 'wb') as wfp:
  for fn in filenames:
    with open(fn, 'rb') as rfp:
      shutil.copyfileobj(rfp, wfp)

score 1 · Accepted Answer

您不需要 python 将许多文件复制到一个文件中。您可以为此使用标准 Windows“复制”：

copy file1.gz /b + file2.gz /b + file3.gz /b allfiles.gz

或者，简单地说：

copy *.gz /b allfiles.gz

但是，如果您想使用 Python，Ignacio 的回答是一个更好的选择。

score 1 · Accepted Answer

如果

cat file1.gz file2.gz file3.gz > allfiles.gz

有效，那么这也应该有效：

fileList = ['file1.gz', 'file2.gz', 'file3.gz']
destFilename = 'allfiles.gz'

bufferSize = 8  # Adjust this according to how "memory efficient" you need the program to be.

with open(destFilename, 'wb') as destFile:
    for fileName in fileList:
        with open(fileName, 'rb') as sourceFile:
            chunk = True
            while chunk:
                chunk = sourceFile.read(bufferSize)
                destFile.write(chunk)

score 0 · Accepted Answer

幸运的是，gzip 压缩文件可以通过catCL 命令直接连接，但不幸的是，似乎没有明显的 python 命令来执行此操作（gzip无论如何在标准库中）。然而，我只是简单地看了看。可能有图书馆可以完成此任务。

尽管如此，使用标准库完成此操作的一种方法是调用catusing subprocess：

from subprocess import check_call
command = "cat {} {} > {}".format(file1_path, file2_path, output_name)
check_call(command.split())  # Check call takes a list

要将其推广到任意数量的输入，您可以执行以下操作：

inputs = ['input1', 'input2', ... 'input9001']
output_name = 'output.gz'

command = "".join(['cat ', '{} ' * len(inputs), '> {out}'])
_call_ = command.format(*inputs, out=output_name).split()

check_call(_call_)

我希望这对某人有帮助。

python - 在 Windows 上使用 Python 连接 gzip 文件

4 回答 4

Related

Reference