我试图找到 LZ77 的正确实现,这是 1977 年论文中的原始著名算法。我发现有许多不同的实现产生不同的输出,但仍然标记为 LZ77。例如,有些使用哈希表,在 LZRW 或 LZJB 等更“官方”的算法中使用的东西。所以我很困惑。
我测试过的一些实现:
- https://gist.github.com/fogus/5401265(C,742 > 538 字节,哈希表?混乱的输出)
- https://sourceforge.net/projects/crush(C++,742 > 508 字节,哈希表?混乱的输出)
- https://github.com/cstdvd/lz77(C,742 > 642 字节——在输出中包含可读的 ASCII)
- http://lab.polygonpla.net/js/tinylz77.html (JS, 742 > 863 bytes!! -- 输出中包含可读的 ASCII)
- http://geocities.com/diogok_br/lz77/demo.html(JS,742 > 658 字节——在输出中包含可读的 ASCII)
- github.com/olle/lz77-kit/src/main/js/lz77.js(JS,742 > 639 字节——在输出中包含可读的 ASCII)
- https://github.com/Favrito/LZ77(C,742 > 755 字节!!)
据我所知,没有人使用任何后处理编码,如霍夫曼等。
我用来压缩的文本:
Oho! Oho! Rise up, O Teti!
Take your head, collect your bones,
Gather your limbs, shake the earth from your flesh!
Take your bread that rots not, your beer that sours not,
Stand at the gates that bar the common people!
The gatekeeper comes out to you, he grasps your hand,
Takes you into heaven, to your father Geb.
He rejoices at your coming, gives you his hands,
Kisses you, caresses you,
Sets you before the spirits, the imperishable stars...
The hidden ones worship you,
The great ones surround you,
The watchers wait on you,
Barley is threshed for you,
Emmer is reaped for you,
Your monthly feasts are made with it,
Your half-month feasts are made with it,
As ordered done for you by Geb, your father,
Rise up, O Teti, you shall not die!
它们都有不同的输出流。是否没有 LZ77 的纯参考实现或标准可供检查?
为什么不是所有的“LZ77”压缩器都提供相同的压缩比、相同的输出比特流?