1

我正在尝试使用模块压缩 Python 3 中的文件夹zipfile

因为我是德国人,所以我有一些包含变音符号 (äöü) 的文件名。

压缩时,我得到一个UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed.

有问题的字符是ü.

我怎样才能zipfile压缩我的所有文件?

相关代码是这样的:

def zipdir(path, ziph):
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))

if __name__ == '__main__':
    zipf = zipfile.ZipFile('path/to/destination', 'w', zipfile.ZIP_DEFLATED)
    zipdir('path/to/folder', zipf)
    zipf.close()

编辑:
当我使用shutil.make_archive.

import shutil

shutil.make_archive('/path/to/destination', 'zip', '/path/to/folder')

的完整堆栈跟踪shutil.make_archive()

Traceback (most recent call last):
  File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
    return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 59: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 39, in <module>
    archive_dir(path, zip_fullpath)
  File "run.py", line 19, in archive_dir
    shutil.make_archive(dest, 'zip', source)
  File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 822, in make_archive
    filename = func(base_name, base_dir, **kwargs)
  File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 720, in _make_zipfile
    zf.write(path, path)
  File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
    with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
  File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
    return self._open_to_write(zinfo, force_zip64=force_zip64)
  File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
    self.fp.write(zinfo.FileHeader(zip64))
  File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
    filename, flag_bits = self._encodeFilenameFlags()
  File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 59: surrogates not allowed

的完整堆栈跟踪zipfile

Traceback (most recent call last):
  File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
    return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 95: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 41, in <module>
    zipdir(path, zipf)
  File "run.py", line 16, in zipdir
    ziph.write(filepath)
  File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
    with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
  File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
    return self._open_to_write(zinfo, force_zip64=force_zip64)
  File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
    self.fp.write(zinfo.FileHeader(zip64))
  File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
    filename, flag_bits = self._encodeFilenameFlags()
  File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed

更新:

我已经尝试了一些似乎对发布链接中的某些人有用的解决方案。这就是我所拥有的:
ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1'))得到了:

Traceback (most recent call last):
  File "run.py", line 41, in <module>
    zipdir(path, zipf)
  File "run.py", line 16, in zipdir
    ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1'))
  File "/usr/lib64/python3.7/zipfile.py", line 1713, in write
    zinfo = ZipInfo.from_file(filename, arcname)
  File "/usr/lib64/python3.7/zipfile.py", line 506, in from_file
    st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/some/path/to/documents/DIS_Broschüre_DE.pdf'

所以编码/解码返回了在文件系统中找不到的东西。

另一个选择:ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8'))得到我

Traceback (most recent call last):
  File "run.py", line 41, in <module>
    zipdir(path, zipf)
  File "run.py", line 16, in zipdir
    ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 96: invalid start byte
4

1 回答 1

1

好的。我发现了问题。questen 中的文件不是我想的那样。通常的 umlaus 工作正常。不知何故,文件名实际上已损坏。像这样:

ls在其中一个目录中给出:
2e_geh�usetechnologie_flyer_qrcode.pdf

命令行自动完成给了我:
2e_geh$'\344'usetechnologie_flyer_qrcode.pdf

由于这些是通过网络接口上传的文件,我只能想象这些文件是在 Windows 或其他非 UNIX 操作系统中制作的,而网络服务器无法处理它。

其他上传的文件有正确的变音符号。我不确定那里发生了什么,但我很高兴这不是 Python 或 Linux FS 的罪魁祸首。

感谢所有的提示。

于 2019-09-28T23:15:30.123 回答