python - 使用 Python 安全地提取 zip 或 tar

Question

我正在尝试将用户提交的 zip 和 tar 文件提取到目录中。zipfile 的extractall方法（与 tarfile 的 extractall 类似）的文档指出，路径可能是绝对..路径或包含超出目标路径的路径。相反，我可以使用extract自己，如下所示：

some_path = '/destination/path'
some_zip = '/some/file.zip'
zipf = zipfile.ZipFile(some_zip, mode='r')
for subfile in zipf.namelist():
    zipf.extract(subfile, some_path)

这安全吗？在这种情况下，档案中的文件是否有可能结束some_path？如果是这样，我怎样才能确保文件永远不会在目标目录之外结束？

score 44 · Accepted Answer

注意：从 python 2.7.4 开始，这对于 ZIP 档案来说不是问题。答案底部的详细信息。这个答案侧重于 tar 档案。

要找出路径真正指向的位置，请使用os.path.abspath()（但请注意关于将符号链接作为路径组件的警告）。如果您规范化 zipfile 中的路径abspath并且它不包含当前目录作为前缀，则它指向它之外。

但是您还需要检查从存档中提取的任何符号链接的值（tarfile 和 unix zipfile 都可以存储符号链接）。如果您担心众所周知的“恶意用户”会故意绕过您的安全性，而不是简单地将自身安装在系统库中的应用程序，这一点很重要。

这就是前面提到的警告：abspath如果您的沙箱已经包含指向目录的符号链接，则会被误导。即使是指向沙箱内的符号链接也可能很危险：符号链接sandbox/subdir/foo -> ..指向，因此应该禁止sandbox该路径。sandbox/subdir/foo/../.bashrc最简单的方法是等到之前的文件被提取并使用 os.path.realpath(). 幸运的是extractall()接受了一个生成器，所以这很容易做到。

由于您要求提供代码，因此这里有一些解释算法的内容。它不仅禁止将文件提取到沙箱外的位置（这是所要求的），而且还禁止在沙箱内创建指向沙箱外位置的链接。我很想知道是否有人可以将任何杂散文件或链接偷偷溜过去。

import tarfile
from os.path import abspath, realpath, dirname, join as joinpath
from sys import stderr

resolved = lambda x: realpath(abspath(x))

def badpath(path, base):
    # joinpath will ignore base if path is absolute
    return not resolved(joinpath(base,path)).startswith(base)

def badlink(info, base):
    # Links are interpreted relative to the directory containing the link
    tip = resolved(joinpath(base, dirname(info.name)))
    return badpath(info.linkname, base=tip)

def safemembers(members):
    base = resolved(".")

    for finfo in members:
        if badpath(finfo.name, base):
            print >>stderr, finfo.name, "is blocked (illegal path)"
        elif finfo.issym() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Hard link to", finfo.linkname
        elif finfo.islnk() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Symlink to", finfo.linkname
        else:
            yield finfo

ar = tarfile.open("testtar.tar")
ar.extractall(path="./sandbox", members=safemembers(ar))
ar.close()

编辑：从 python 2.7.4 开始，这对 ZIP 档案来说不是问题：该方法zipfile.extract()禁止在沙箱外创建文件：

注意：如果成员文件名是绝对路径，驱动器/UNC 共享点和前导（反）斜杠将被剥离，例如：在 Unix 上///foo/bar变为在Windows 上。并且成员文件名中的所有组件都将被删除，例如：变成. 在 Windows 上，非法字符 ( , , , , , , 和) [被] 替换为下划线 (_)。foo/barC:\foo\barfoo\bar".."../../foo../../ba..rfoo../ba..r:<>|"?*

该tarfile课程尚未经过类似的消毒，因此上述答案仍然适用。

score 3 · Accepted Answer

将 zip 文件复制到一个空目录。然后使用os.chroot使该目录成为根目录。然后在那里解压。

或者，您可以使用忽略目录的标志调用unzip自身：-j

import subprocess
filename = '/some/file.zip'
rv = subprocess.call(['unzip', '-j', filename])

score 3 · Accepted Answer

使用ZipFile.infolist()//获取归档中每个条目的信息，规范化路径，自己打开文件，使用TarFile.next()/获取条目的类文件，并自己复制条目数据。TarFile.getmembers()ZipFile.open()TarFile.extractfile()

score 3 · Accepted Answer

与流行的答案相反，从 Python 2.7.4 开始，安全地解压缩文件并没有完全解决。extractall 方法仍然很危险，并且可能导致路径遍历，无论是直接还是通过解压缩符号链接。这是我的最终解决方案，它应该可以防止所有版本的 Python 中的两种攻击，甚至是 Python 2.7.4 之前的版本，其中 extract 方法很容易受到攻击：

import zipfile, os

def safe_unzip(zip_file, extract_path='.'):
    with zipfile.ZipFile(zip_file, 'r') as zf:
        for member in zf.infolist():
            file_path = os.path.realpath(os.path.join(extract_path, member.filename))
            if file_path.startswith(os.path.realpath(extract_path)):
                zf.extract(member, extract_path)

编辑 1：固定变量名称冲突。感谢 Juuso Ohtonen。

编辑2 s/abspath/realpath/g：。谢谢TheLizzard

python - 使用 Python 安全地提取 zip 或 tar

4 回答 4

Related

Reference