这是关于 bzip2存档格式的问题。任何 Bzip2 存档都由文件头、一个或多个块和尾部结构组成。所有块都应以“1AY&SY”开头,Pi 编号的 6 字节 BCD 编码数字 0x314159265359。根据bzip2 的来源:
/*--
A 6-byte block header, the value chosen arbitrarily
as 0x314159265359 :-). A 32 bit value does not really
give a strong enough guarantee that the value will not
appear by chance in the compressed datastream. Worst-case
probability of this event, for a 900k block, is about
2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
For a compressed file of size 100Gb -- about 100000 blocks --
only a 48-bit marker will do. NB: normal compression/
decompression do *not* rely on these statistical properties.
They are only important when trying to recover blocks from
damaged files.
--*/
问题是:所有 bzip2 档案都将具有与字节边界对齐的块,这是真的吗?我的意思是所有由 bzip2 的参考实现创建的档案,bzip2-1.0.5+ 实用程序。
我认为 bzip2 可能不会将流解析为字节流,而是解析为比特流(块本身由 huffman 编码,设计上不是字节对齐的)。
所以,换句话说:如果grep -c 1AY&SY
更大(霍夫曼可能会在块内生成 1AY&SY)或等于文件中 bzip2 块的计数?