我正在尝试将 8 位以外的非隔行 GIF 图像添加到 PDF 文档,而不必使用PDF::Create
Perl 完全解码比特流。
作为PDF 标准LZWDecode
一部分的算法要求所有图像的最小 LZW 代码大小为 8 位,并且硬编码为仅嵌入 8 位图像。PDF::Create
到目前为止,我已经调整了图像加载器以PDF::Create
读取 5 位图像并完全解码 LZW 流。然后我可以使用编码器算法PDF::Create
将图像重新打包为 8 位。
我想做的是消除内存密集型解码/编码步骤。该线程表明,这可以通过“加宽或移位位”来使 LZW 代码适合LZWDecode
.
我联系了线程作者,他提供了一些额外的细节,特别是颜色索引的代码保持不变,但用零填充(例如,[10000]
变为[000010000]
),代码分别更改为<Clear>
和,所有其他代码偏移256 - 原始代码。<End>
<256>
<257>
<Clear>
但是,由于雇主的限制,他无法进一步详细说明。<4095>
特别是,当修改值超过(LZW代码表的最大索引)时,我不确定如何处理代码。我也不确定如何将修改后的代码重新打包成比特流。
我目前使用的算法如下。
# Read 5-bit data stream
sub ReadData5 {
my $data = shift;
my $c_size = 6; # minimium LZW code size
my $t_size = 33; # initial code table size
my ($i_buff,$i_bits) = (0,0); # input buffer
my ($o_buff,$o_bits) = (0,0); # output buffer
my $stream = ''; # bitstream
my $pos = 0;
SUB_BLOCK: while (1){
my $s = substr($data, $pos++, 1);
# get sub-block size
my $n_bytes = unpack('C', $s) or last SUB_BLOCK;
my $c_mask = (1 << $c_size) - 1;
BYTES: while (1){
# read c_size bits
while ($i_bits < $c_size){
# end of sub-block
!$n_bytes-- and next SUB_BLOCK;
$s = substr($data, $pos++, 1);
my $c = unpack('C', $s);
$i_buff |= $c << $i_bits;
$i_bits += 8;
}
# write c_size bits
my $code = $i_buff & $c_mask;
my $w_bits = $c_size;
$i_buff >>= $c_size;
$i_bits -= $c_size;
$t_size++;
if ($o_bits > 0){
$o_buff |= $code >> ($c_size - 8 + $o_bits);
$w_bits -= 8 - $o_bits;
$stream .= pack('C', $o_buff & 0xFF);
}
if ($w_bits >= 8){
$w_bits -= 8;
$stream .= pack('C', ($code >> $w_bits) & 0xFF);
}
if (($o_bits = $w_bits) > 0){
$o_buff = $code << (8 - $o_bits);
}
# clear code
if ($code == 32){
$c_size = 6;
$t_size = 33;
$c_mask = (1 << $c_size) - 1;
}
# end code
if ($code == 33){
$stream .= pack('C', $o_buff & 0xFF);
last SUB_BLOCK;
}
if ($t_size == (1 << $c_size)){
if (++$c_size > 12){
$c_size--;
} else {
$c_mask = (1 << $c_size) - 1;
}
}
}
}
# Pad with zeros to byte boundary
$stream .= '0' x (8 - length($stream) % 8);
return $stream;
}
#---------------------------------------------------------------------------
# Decode 5-bit data stream
sub UnLZW5 {
my $data = shift;
my $c_size = 6; # minimium LZW code size
my $t_size = 33; # initial code table size
my ($i_buff,$i_bits) = (0,0); # input buffer
my $stream = ''; # bitstream
my $pos = 0;
# initialize code table
my @table = map { chr($_) } 0..$t_size-2;
$table[32] = '';
my $prefix = '';
my $suffix = '';
# get first code word
while ($i_bits < $c_size){
my $d = unpack('C', substr($data, $pos++, 1));
$i_buff = ($i_buff << 8) + $d;
$i_bits += 8;
}
my $c2 = $i_buff >> ($i_bits - $c_size);
$i_bits -= $c_size;
my $c_mask = (1 << $i_bits) - 1;
$i_buff &= $c_mask;
# get remaining code words
DECOMPRESS: while ($pos < length($data)){
my $c1 = $c2;
while ($i_bits < $c_size){
my $d = unpack('C', substr($data, $pos++, 1));
$i_buff = ($i_buff << 8) + $d;
$i_bits += 8;
}
$c2 = $i_buff >> ($i_bits - $c_size);
$i_bits -= $c_size;
$c_mask = (1 << $i_bits) - 1;
$i_buff &= $c_mask;
# clear code
if ($c2 == 32){
$stream .= $table[$c1];
$#table = 32;
$c_size = 6;
$t_size = 33;
next DECOMPRESS;
}
# end code
if ($c2 == 33){
$stream .= $table[$c1];
last DECOMPRESS;
}
# get prefix and suffix
$prefix = $table[$c1] if $c1 < $t_size;
$suffix = $c2 < $t_size ? substr($table[$c2], 0, 1) : substr($prefix, 0, 1);
# write prefix
$stream .= $prefix;
# write multiple-character sequence
$table[$t_size++] = $prefix . $suffix;
# increase code size
if ($t_size == 2 ** $c_size){
if (++$c_size > 12){
$c_size--;
}
}
}
return $stream;
}