zip - 我们如何估计压缩文件的“开销”？

Question

例如，假设我们压缩一个7字节大小的.txt文件。压缩并转换为.zip文件后，大小为190字节。

有没有办法估计或计算“开销”的近似大小？

什么因素会影响开销大小？

Zlib计算开销：他们说：“……只有扩展是每 16 KB 块 5 个字节的开销（约 0.03%），加上整个流的 6 个字节的一次性开销。”</p>

我只是把这个网站告诉它可以估计“开销”大小。

注意：开销是添加到数据压缩版本中的一些额外数据。

score 3 · Accepted Answer

From the ZIP format ..

Assuming that there is only one central directory and no comments and no extra fields, the overhead should be similar to the following. (The overhead will only go up if any additional metadata is added.)

Per file (Local file header) - 30+len(filename)
Per file (Data descriptor) - 12 (to 16)
Per file (Central directory header) - 46+len(filename)
Per archive (EOCD) - 22

So the overhead, where afn is the average length of all file names, and f is the number of files:

  f * ((30 + afn) + 12 + (46 * afn)) + 22
= f * (88 + 2 * afn) + 22

This of course makes ZIP a very poor choice for very tiny bits of compressed data where a (file) structure or metadata is not required - zlib, on the other hand, is a very thin Deflate wrapper.

For small payloads, a poor Deflate implementation may also result in a significantly larger "compressed" size, such as the notorious .NET implementation ..

Examples:

Storing 1 file, with name "hello world note.txt" (len = 20),

= 1 * (88 + 2 * 20) + 22 = 150 bytes overhead
Storing 100 files, with an average name of 14 letters,

= 100 * (88 + 2 * 14) + 22 = 11622 bytes overhead

zip - 我们如何估计压缩文件的“开销”？

1 回答 1

Related

Reference