python - 如何在 Python 2.5 中模拟 ZipFile.open？

Question

我想将文件从 zip 提取到特定路径，忽略存档中的文件路径。这在 Python 2.6 中非常容易（我的文档字符串比代码长）

import shutil
import zipfile

def extract_from_zip(name, dest_path, zip_file):
    """Similar to zipfile.ZipFile.extract but extracts the file given by name
    from the zip_file (instance of zipfile.ZipFile) to the given dest_path
    *ignoring* the filename path given in the archive completely
    instead of preserving it as extract does.
    """
    dest_file = open(dest_path, 'wb')
    archived_file = zip_file.open(name)
    shutil.copyfileobj(archived_file, dest_file)


 extract_from_zip('path/to/file.dat', 'output.txt', zipfile.ZipFile('test.zip', 'r'))

但在 Python 2.5 中，ZipFile.open方法不可用。我在 stackoverflow 上找不到解决方案，但是这个论坛帖子有一个很好的解决方案，它利用ZipInfo.file_offsetzip 在 zip 中寻找正确的点并zlib.decompressobj从那里解压缩字节。不幸ZipInfo.file_offset的是在 Python 2.5 中被删除了！

所以，鉴于我们在 Python 2.5 中ZipInfo.header_offset只有 . 使用维基百科作为参考（我知道）我想出了这个更长且不是很优雅的解决方案。

import zipfile
import zlib

def extract_from_zip(name, dest_path, zip_file):
    """Python 2.5 version :("""
    dest_file = open(dest_path, 'wb')
    info = zip_file.getinfo(name)
    if info.compress_type == zipfile.ZIP_STORED:
        decoder = None
    elif info.compress_type == zipfile.ZIP_DEFLATED:
        decoder = zlib.decompressobj(-zlib.MAX_WBITS)
    else:
        raise zipfile.BadZipFile("Unrecognized compression method")

    # Seek over the fixed size fields to the "file name length" field in
    # the file header (26 bytes). Unpack this and the "extra field length"
    # field ourselves as info.extra doesn't seem to be the correct length.
    zip_file.fp.seek(info.header_offset + 26)
    file_name_len, extra_len = struct.unpack("<HH", zip_file.fp.read(4))
    zip_file.fp.seek(info.header_offset + 30 + file_name_len + extra_len)

    bytes_to_read = info.compress_size

    while True:
        buff = zip_file.fp.read(min(bytes_to_read, 102400))
        if not buff:
            break
        bytes_to_read -= len(buff)
        if decoder:
            buff = decoder.decompress(buff)
        dest_file.write(buff)

    if decoder:
        dest_file.write(decoder.decompress('Z'))
        dest_file.write(decoder.flush())

请注意我是如何解压并读取给出额外字段长度的字段的，因为调用len该ZipInfo.extra属性会减少 4 个字节，从而导致偏移量计算不正确。也许我在这里遗漏了一些东西？

任何人都可以改进 Python 2.5 的这个解决方案吗？

编辑：我应该说，克里斯亚当斯建议的明显解决方案

dest_file.write(zip_file.read(name))

对于 zip 中包含的任何合理大小的文件，它都会失败，MemoryError因为它会尝试将整个文件一次性放入内存中。我有大文件，所以我需要将内容流式传输到磁盘。

此外，升级 Python 是显而易见的解决方案，但它完全不在我的掌控之中，而且基本上是不可能的。

score 4 · Accepted Answer

还没有测试过这一点，但我在 Python 2.4 中使用了非常相似的东西

import zipfile

def extract_from_zip(name, dest_path, zip_file):
    dest_file = open(dest_path, 'wb')
    dest_file.write(zip_file.read(name))
    dest_file.close()

extract_from_zip('path/to/file/in/archive.dat', 
        'output.txt', 
        zipfile.ZipFile('test.zip', 'r'))

score 1 · Accepted Answer

我知道我在这个问题上迟到了一点，但遇到了完全相同的问题。

我使用的解决方案是复制 python 2.6.6 版本的 zipfile 并放入一个文件夹（我称之为 python_fix）并导入它：

python_fix/zipfile.py

然后在代码中：

import python_fix.zipfile as zipfile

从那里我可以将 2.6.6 版本的 zipfile 与 python 2.5.1 解释器一起使用（2.7.X 版本在此版本的“with”上失败）

希望这可以帮助其他人使用古老的技术。

score 0 · Accepted Answer

鉴于我的限制，我的问题似乎给出了答案：自己解析 ZipFile 结构，并zlib.decompressobj在找到字节后用于解压缩字节。

如果您没有（/遭受）我的限制，您可以在这里找到更好的答案：

如果可以，只需将 Python 2.5 升级到 2.6（或更高版本！），正如 Daenyth 的评论中所建议的那样。
如果压缩包中只有小文件可以 100% 加载到内存中，请使用ChrisAdams 的回答
如果您可以引入对外部实用程序的依赖关系，请按照弗拉德的回答/usr/bin/unzip中的建议进行适当的系统调用或类似调用

python - 如何在 Python 2.5 中模拟 ZipFile.open？

3 回答 3

Related

Reference