c++ - 保存霍夫曼代码的问题？

Question

我想将霍夫曼代码保存到文件中。我怎样才能做到这一点？我将霍夫曼代码保存到字符串中，但生成文件的大小大于原始文件。

score 4 · Accepted Answer

A very simple approach is to write one bit at a time with something like the following:

unsigned char acc; // Accumulator of bit waiting to be written
int bitcount;      // How many bits are aready present in the accumulator

// write a single bit (0/1)
void writebit(int bit)
{
    acc |= (bit << bitcount);
    if (++bitcount == 8)
    {
        writebyte(acc);
        acc = 0;
        bitcount = 0;
    }
}

to read back a sigle bit the procedure is symmetrical

unsigned char acc;   // bits waiting to be extracted
int bitcount;        // how many bits are still available in acc

int readbit()
{
   if (bitcount == 0)
   {
       bitcount = 8;
       acc = readbyte();
   }
   --bitcount;
   return (acc >> (7 - bitcount)) & 1;
}

of course this is just the simplest approach, but I'd wait before worrying about code speed until you are first able to save and load correctly your encoded data.

Example:

Suppose you have the following Huffman coded symbols

A - 0
B - 10
C - 110
D - 111

and that you want to encode the sequence

A B A A C D A D B B

then you would call in order

writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(0);                           // A
writebit(0);                           // A
writebit(1); writebit(1); writebit(0); // C
writebit(1); writebit(1); writebit(1); // D
writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(1); writebit(0);              // B

The actual bytes written would therefore be

(01100010) = 0x62
(01010111) = 0x57

(Note that the code shown starts from the least significant bit, i.e. you should read the bit sequences inside the parenthesis from right to left if you want to recognize the symbols).

score 1 · Accepted Answer

我相信您要保存的是一串 1 和 0。真正的霍夫曼代码需要以二进制形式保存，然后再进行解析。如果您只是将输出保存为字符串，那么您就违背了霍夫曼代码的目的，每个 0 和 1 都是 8 位而不是 1。

score 1 · Accepted Answer

您可能正在为每个模式/字母保存整个字节。

假设 e 是最常见的字母。它将有一个位模式 0。

假设 z 是最不常见的字母，它将具有以 1 开头的某种模式。让我们将其分配为 1111 111。

你要写的文件是这样的：

0111 1111

你可能正在写这个：

0000 0000 0111 1111。

您需要利用按位运算来执行此操作。

c++ - 保存霍夫曼代码的问题？

3 回答 3

Example:

Related

Reference