algorithm - LZW解压算法

Question

我正在为必须实现 LZW 压缩/解压缩的任务编写程序。我为此使用以下算法：

-压缩

w = NIL;
   while ( read a character k )
       {
         if wk exists in the dictionary
         w = wk;
         else
         add wk to the dictionary;
         output the code for w;
         w = k;
       }

-减压

read a character k;
   output k;
   w = k;
   while ( read a character k )    
  /* k could be a character or a code. */
        {
         entry = dictionary entry for k;
         output entry;
         add w + entry[0] to dictionary;
         w = entry;
        }

对于压缩阶段，我只是输出表示字典条目索引的整数，起始字典也由 ascii 字符（0 - 255）组成。但是当我进入解压阶段时，我会收到此错误，例如，如果我压缩一个仅包含“booop”的文本文件，它将通过这些步骤生成一个输出文件：

w       k       Dictionary          Output

-       b       -                   -
b       o       bo (256)            98 (b)
o       o       oo (257)            111 (o)
o       o       -                   -
oo      p       oop (258)           257 (oo)
p       -       -                   112 (p)

输出.txt：98 111 257 112

然后当我来解压文件时

w       k          entry        output       Dictionary
        98 (b)                  b   
b       111 (o)    o            o             bo (256)
o       257 (error)

257 (oo) 尚未添加。谁能看到我在这里出错的地方，因为我很难过。算法错了吗？

score 8 · Accepted Answer

您的压缩部分正确且完整，但解压缩部分不完整。您只包括代码在字典中的情况。由于解压缩过程总是比压缩过程落后一步，因此解码器有可能找到不在字典中的代码。但是由于它只是落后了一步，它可以弄清楚编码过程接下来会添加什么并正确输出解码后的字符串，然后将其添加到字典中。像这样继续你的解压过程：

-减压

read a character k;
   output k;
   w = k;
   while ( read a character k )    
  /* k could be a character or a code. */
        {
         if k exists in the dictionary
         entry = dictionary entry for k;
         output entry;
         add w + entry[0] to dictionary;
         w = entry;
         else
         output entry = w + firstCharacterOf(w);
         add entry to dictionary;
         w = entry;
        }

然后当你来解压文件，看到257，你发现字典里没有。但是你知道前面的条目是'o'，它的第一个字符也是'o'，把它们放在一起，你会得到“oo”。现在输出 oo 并将其添加到字典中。接下来你得到代码 112 并且确定你知道它是 p。完毕！

w       k          entry        output       Dictionary
        98 (b)                  b   
b       111 (o)    o            o             bo (256)
o       257 (oo)                oo            oo(257)
oo      112(p)                  p

有关更多信息，请参阅： Steve Blackstock 的此解释。一个更好的页面，其中包含 “icafe” Java 图像库 GIF 编码器和解码器所基于的实际解码器和编码器实现的流程图。

score 1 · Accepted Answer

来自http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch您是否陷入这种情况？

如果解码器收到一个尚未在其字典中的代码 Z，会发生什么？由于解码器始终只是编码器后面的一个代码，因此只有当编码器刚刚生成它时，Z 才能在编码器的字典中，当为 χ 发出前一个代码 X 时。因此 Z 编码了一些 ω 即 χ + ?，解码器可以确定未知字符如下：

1) The decoder sees X and then Z.
2) It knows X codes the sequence χ and Z codes some unknown sequence ω.
3) It knows the encoder just added Z to code χ + some unknown character,
4) and it knows that the unknown character is the first letter z of ω.
5) But the first letter of ω (= χ + ?) must then also be the first letter of χ.
6) So ω must be χ + x, where x is the first letter of χ.
7) So the decoder figures out what Z codes even though it's not in the table,
8) and upon receiving Z, the decoder decodes it as χ + x, and adds χ + x to the table as the value of Z.

每当编码器遇到 cScSc 形式的输入时，就会出现这种情况，其中 c 是单个字符，S 是字符串，并且 cS 已经在字典中，但 cSc 不在。编码器发出 cS 的代码，将 cSc 的新代码放入字典中。接下来，它在输入中看到 cSc（从 cScSc 的第二个 c 开始）并发出它刚刚插入的新代码。上面的论点表明，每当解码器接收到不在其字典中的代码时，情况一定是这样的。

尽管 cScSc 形式的输入似乎不太可能，但当输入流具有大量重复特征时，这种模式相当普遍。特别是，单个字符的长字符串（在 LZW 经常用于编码的图像种类中很常见）重复生成这种类型的模式。

对于这种特定情况，维基百科的东西很合适，你有 X+？其中 X 是 (o)，Z 到目前为止是未知的，所以第一个字母是 X，将 (oo) 添加到表中作为 257。我只是在继续我在维基百科上读到的内容，让我们知道结果如何如果这不是解决方案。

algorithm - LZW解压算法

2 回答 2

Related

Reference