0

我必须建立一个基于霍夫曼算法的压缩器。到目前为止,我设法用每个字符的频率创建了树,并为每个字符生成了一个具有较少位数的表示。

是这样的,对于短语“good this sweetplum”:

'o' 000, '' 001, 't' 0100, 'r' 0101, 'p' 0110, 'm' 0111, 'l' 1000, 'i' 1001, 'h' 1010, 'd' 1011, 'a'1100, 'u' 1101, 'g' 1110, 's' 1111

我现在遇到的问题是找到一种将树保存在存档中的方法,这样我就可以重建它,然后解压缩文件。

有什么建议么?

我做了一些研究,但发现很难理解,所以如果你能详细解释一下,我将不胜感激。


我用来从文件中读取频率的代码是:

int main (int argc, char *argv[])
{
    int i;

    TipoSentinela *sentinela;

    TipoLista *no = NULL;

    Arv *arvore, *arvore2, *arvore3;

    int *repete = (int *) calloc (256, sizeof(int));

    if(argc == 2)
    {
    in = load_base(argv[1]);
    le_dados_arquivo (repete); //read the frequencies from the file
    sentinela = cria_lista (); //create a marker for the tree node list

    for (i = 0; i < 256; i++)
    {   
        if(repete[i] > 0 && i != 0) 
        {
            arvore = arv_cria (Cria_info (i, repete[i])); //create a tree node with the character i and the frequence of it in the file
            no = inicia_lista (arvore, no, sentinela); //create the list of tree nodes
        }
    }

    Ordena (sentinela); //sort the tree nodes list by the frequencies

    for(Seta_primeiro(sentinela); Tamanho_lista(sentinela) != 1; Move_marcador(sentinela))
    {
        Seta_primeiro(sentinela); //put the marker in the first element of the list
        no = Retorna_marcador(sentinela);
        arvore2 = Retorna_arvore (no); //return the tree represented by the list marker
        Move_marcador(sentinela); //put the marker to the next element
        arvore3 = Retorna_arvore (Retorna_marcador (sentinela)); //return the tree represented by the list marker
        arvore = Cria_pai (arvore2, arvore3); //create a tree node that will contain the both arvore2 and arvore3
        Insere_arvoreFinal (sentinela, arvore); //insert the node at the end of the list
        Remove_arvore (sentinela); //remove the node arvore2 from the list
        Remove_arvore (sentinela); //remove the node arvore3 from the lsit
        Ordena (sentinela); //sort the list again

    }

    out = load_out(argv[1]); //open the output file

    Codificacao (arvore); //generate the code from each node of the tree

    rewind(in);

    char c;

    while(!feof(in))
    {
        c = fgetc(in);

        if(c != EOF)
            arvore2 = Procura_info (arvore, c); //search the character c in the tree

        if(arvore2 != NULL)
            imprimebit(Retorna_codigo(arvore2), out); //write the code in the file
    }

    fclose(in);
    fclose(out);
    free(repete);
    arvore = arv_libera (arvore);
    Libera_Lista(sentinela);
    }

    return 0;
}

//bit_counter and cur_byte are global variables
void write_bit (unsigned char bit, FILE *f)
{
    static k = 0;
    if(k != 0)
    {
        if(++bit_counter == 8)
        {
            fwrite(&cur_byte,1,1,f);
            bit_counter = 0;
            cur_byte = 0;
        }
    }

    k = 1;

    cur_byte <<= 1;
    cur_byte |= ('0' != bit);
}

//aux is the code of a character in the tree
void imprimebit(char *aux, FILE *f)
{
    int i, j;

    if(aux == NULL)
        return;

    for(i = 0; i < strlen(aux); i++)
    {
        write_bit(aux[i], f); //write the bits of the code in the file
    }
}

有了这个,我可以在输出文件中编写所有字符的代码,但我也看不到存储树的方法。

4

2 回答 2

1

您不需要发送树。只需发送长度。然后建立一个一致的算法,将长度转换为两端的代码。一致性称为“规范”霍夫曼代码。您按长度对代码进行排序,并在每个长度内按符号排序。然后分配从 0 开始的代码。所以你最终会得到(_ 表示空格):

_ 000
o 001
a 0100
d 0101
g 0110
h 0111
i 1000
l 1001
m 1010
p 1011
r 1100
s 1101
t 1110
u 1111
于 2013-09-06T15:21:03.107 回答
0

我确实找到了一种存储每个字符代码的方法。例如:我写树,从根开始,向左向下,然后向右。所以,如果我的树是这样的

         0
      /    \
    0        1
 /    \     /   \
'a'   'b' 'c'   'd'

我的文件的标题是这样的: 001[8 bits from 'a']1[8 bits from b]01[8 bits from c]1[8 bits from d]

有了这个,我就可以重建我的树了。

我现在的问题是逐位读取文件头以了解我必须创建一个新节点的方向。

于 2013-09-08T15:22:55.913 回答