block - Bzip2 块头：1AY&SY

Question

这是关于 bzip2存档格式的问题。任何 Bzip2 存档都由文件头、一个或多个块和尾部结构组成。所有块都应以“1AY&SY”开头，Pi 编号的 6 字节 BCD 编码数字 0x314159265359。根据bzip2 的来源：

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是：所有 bzip2 档案都将具有与字节边界对齐的块，这是真的吗？我的意思是所有由 bzip2 的参考实现创建的档案，bzip2-1.0.5+ 实用程序。

我认为 bzip2 可能不会将流解析为字节流，而是解析为比特流（块本身由 huffman 编码，设计上不是字节对齐的）。

所以，换句话说：如果grep -c 1AY&SY更大（霍夫曼可能会在块内生成 1AY&SY）或等于文件中 bzip2 块的计数？

score 4 · Accepted Answer

BZIP2 查看比特流。

来自http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html：

无论如何，重要的位是 BZIP2 文件包含一个或多个“流”，它们是字节对齐的，每个包含一个（零？）或多个“块”，它们不是字节对齐的，后跟一个流结束标记 (六个字节 0x177245385090 是作为二进制编码十进制 (BCD) 的 pi 的平方根、一个四字节校验和以及用于字节对齐的空位）。

bzip2维基百科文章还提到了位块对齐（参见文件格式部分），这似乎与我在学校记得的内容一致（必须实现算法......）。

block - Bzip2 块头：1AY&SY

1 回答 1

Related

Reference