我在实现 LZW 的压缩器时遇到了麻烦。压缩器似乎工作正常,但是在处理一些流时它没有结束流字符(使用值 256 定义),结果是解压缩器将无限循环。压缩机代码如下:
int compress1(FILE* input, BIT_FILE* output) {
CODE next_code; // next node
CODE current_code; // current node
CODE index; // node of the found character
int character;
int ret;
next_code = FIRST_CODE;
dictionary_init();
if ((current_code = getc(input)) == EOF)
current_code = EOS;
while ((character = getc(input)) != EOF) {
index = dictionary_lookup(current_code, (SYMBOL)character);
if (dictionary[index].code != UNUSED) {
current_code = dictionary[index].code;
}
else {
if (next_code <= MAX_CODE-1) {
dictionary[index].code = next_code++;
dictionary[index].parent = current_code;
dictionary[index].symbol = (SYMBOL)character;
}
else {
// handling full dictionary
dictionary_init();
next_code = FIRST_CODE;
}
ret = bit_write(output, (uint64_t) current_code, BITS);
if( ret != 0)
return -1;
current_code = (CODE)character;
}
}
ret = bit_write(output, (uint64_t) current_code, BITS);
if (ret != 0)
return -1;
ret = bit_write(output, (uint64_t) EOS, BITS);
if (ret != 0)
return -1;
if (bit_close(output) == -1) {
printf("Ops: error during closing\n");
return -1;
}
return 0;
}
CODE
和SYMBOL
分别属于typedef
和uint32_t
,uint16_t
定义FIRST_CODE
为 257。该函数dictionary_init()
简单地初始化字典,返回父节点“ ”(如果存在)dictionary_lookup()
的具有符号“”的子节点的索引。character
current_node
二进制文件的写入定义为:
int bit_write(BIT_FILE* bf, uint64_t data, int len)
{
int space, result, offset, wbits, udata;
uint64_t* p;
uint64_t tmp;
udata = (int)data;
if (bf == NULL || len < 1 || len > (8* sizeof(data)))
return -1;
if (bf->reading == true)
return -1;
while (len > 0) {
space = bf->end - bf->next;
if (space < 0) {
return -1;
}
// if buffer is full, flush data to file and reinit BIT_IO struct
if (space == 0) {
result = bit_flush(bf);
if (result < 0)
return -1;
}
p = bf->buf + (bf->next/64);
offset = bf->next % 64;
wbits = 64 - offset;
if (len < wbits)
wbits = len;
tmp = le64toh(*p);
tmp |= (data << offset);
*p = htole64(tmp);
bf->next += wbits;
len -= wbits;
data >>= wbits;
}
return 0;
}
我已经使用另一个函数打开了文件,因此将指向结构bit_write
的指针作为输入。bf
有人可以帮我找出错误吗?
出现此问题的示例如下:
如果输入字符串是“ Nel mezzo del cammi ”,一切正常,我有以下压缩文件(十六进制,使用 12 位编码符号):
4E 50 06 6C 00 02 6D 50 06 7A A0 07 6F 00 02 64
20 10 20 30 06 61 D0 06 6D 90 06 0D A0 00 00 01
如果我在字符串中添加另一个字符,特别是“ Nel mezzo del cammin ”,我会得到以下结果:
4E 50 06 6C 00 02 6D 50 06 7A A0 07 6F 00 02 64
20 10 20 30 06 61 D0 06 6D 90 06 6E D0 00 0A 00
10
在第二种情况下,它没有正确写入 End of Stream。
解决方案:检查缓冲区中是否有足够的空间用于我要编写的整个编码符号。只是改变:
if (space == 0)
至:
if(space == 0 && space < len)