python - zlib.error: 解压时出错 -3: 不正确的标头检查

Question

我有一个 gzip 文件，我正在尝试通过 Python 读取它，如下所示：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它抛出这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

我怎样才能克服它？

score 142 · Accepted Answer

你有这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

这很可能是因为您正在尝试检查不存在的标头，例如您的数据遵循RFC 1951（deflate压缩格式）而不是RFC 1950（zlib压缩格式）或RFC 1952（gzip压缩格式）。

选择windowBits

但zlib可以解压缩所有这些格式：

（解）压缩deflate格式，使用wbits = -zlib.MAX_WBITS
（解）压缩zlib格式，使用wbits = zlib.MAX_WBITS
（解）压缩gzip格式，使用wbits = zlib.MAX_WBITS | 16

请参阅http://www.zlib.net/manual.html#Advanced（部分inflateInit2）中的文档

例子

测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

明显的测试zlib：

>>> zlib.decompress(zlib_data)
'test'

测试deflate：

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

测试gzip：

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

数据也与gzip模块兼容：

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测（zlib 或 gzip）

添加32到windowBits将触发标头检测

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

`gzip`改为使用

或者您可以忽略zlib并直接使用gzip模块；但请记住，在引擎盖下，gzip使用zlib.

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

score 4 · Accepted Answer

更新：dnozay 的答案解释了问题，应该是公认的答案。

试试这个gzip模块，下面的代码直接来自python 文档。

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

score 3 · Accepted Answer

我刚刚解决了解压缩 gzip 数据时出现的“不正确的标头检查”问题。

您需要在调用 inflateInit2 时设置 -WindowBits => WANT_GZIP（使用 2 版本）

是的，这可能非常令人沮丧。对文档的典型浅读将 Zlib 视为 Gzip 压缩的 API，但默认情况下（不使用 gz* 方法）它不会创建或解压缩 Gzip 格式。您必须发送这个非非常显眼的记录标志。

score 2 · Accepted Answer

要解压缩内存中不完整的 gzipped 字节，dnozay 的答案很有用，但它错过了zlib.decompressobj我认为必要的调用：

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

请注意，这zlib.MAX_WBITS | 16是15 | 1631。有关的一些背景信息wbits，请参阅zlib.decompress。

信用：Yann Vernier 的回答，其中记录了zlib.decompressobj电话。

score 2 · Accepted Answer

这不能回答最初的问题，但它可能会帮助到这里的其他人。

zlib.error: Error -3 while decompressing: incorrect header check也出现在下面的示例中：

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

该示例是我在一些遗留 Django 代码中遇到的内容的最小复制，其中Base64编码字节（来自 HTTP POST）存储在 Django CharField（而不是BinaryField）中。

CharField从数据库中读取一个值时，在没有显式的情况下str()调用该值，如Django 源代码所示。encoding

str() 文档说：

如果既没有给出编码也没有给出错误，str(object) 返回 object。str ()，它是对象的“非正式”或可很好打印的字符串表示形式。对于字符串对象，这是字符串本身。如果 object 没有str () 方法，则 str() 回退到返回 repr(object)。

因此，在示例中，我们无意中进行了 base64 解码

"b'eJxLTEpOSQUABcgB8A=='"

代替

b'eJxLTEpOSQUABcgB8A=='.

如果使用显式，则示例中的zlib解压缩将成功encoding，例如str(b64_encoded_bytes, 'utf-8').

特定于 Django 的注意事项：

特别棘手的是：此问题仅在从数据库中检索值时出现。例如，参见下面的测试，它通过了（在 Django 3.0.3 中）：

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

MyModel在哪里

class MyModel(models.Model):
    data = models.CharField(max_length=100)

score 1 · Accepted Answer

有趣的是，我在尝试使用 Python 使用 Stack Overflow API 时遇到了这个错误。

我设法让它与GzipFilegzip 目录中的对象一起工作，大致如下：

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

score 1 · Accepted Answer

我的案例是解压缩存储在 Bullhorn 数据库中的电子邮件。片段如下：

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

score -3 · Accepted Answer

只需添加标题 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

python - zlib.error: 解压时出错 -3: 不正确的标头检查

8 回答 8

选择windowBits

例子

自动标头检测（zlib 或 gzip）

gzip改为使用

Related

Reference

`gzip`改为使用