1

我正在尝试将 8 位以外的非隔行 GIF 图像添加到 PDF 文档,而不必使用PDF::CreatePerl 完全解码比特流。

作为PDF 标准LZWDecode一部分的算法要求所有图像的最小 LZW 代码大小为 8 位,并且硬编码为仅嵌入 8 位图像。PDF::Create

到目前为止,我已经调整了图像加载器PDF::Create读取 5 位图像并完全解码 LZW 流。然后我可以使用编码器算法PDF::Create将图像重新打包为 8 位。

我想做的是消除内存密集型解码/编码步骤。该线程表明,这可以通过“加宽或移位位”来使 LZW 代码适合LZWDecode.

我联系了线程作者,他提供了一些额外的细节,特别是颜色索引的代码保持不变,但用零填充(例如,[10000]变为[000010000]),代码分别更改为<Clear>和,所有其他代码偏移256 - 原始代码。<End><256><257><Clear>

但是,由于雇主的限制,他无法进一步详细说明。<4095>特别是,当修改值超过(LZW代码表的最大索引)时,我不确定如何处理代码。我也不确定如何将修改后的代码重新打包成比特流。

我目前使用的算法如下。

# Read 5-bit data stream

sub ReadData5 {

    my $data = shift;

    my $c_size = 6;                # minimium LZW code size
    my $t_size = 33;               # initial code table size
    my ($i_buff,$i_bits) = (0,0);  # input buffer
    my ($o_buff,$o_bits) = (0,0);  # output buffer

    my $stream = '';               # bitstream
    my $pos    = 0;

    SUB_BLOCK: while (1){
        my $s = substr($data, $pos++, 1);

        # get sub-block size
        my $n_bytes  = unpack('C', $s) or last SUB_BLOCK;
        my $c_mask   = (1 << $c_size) - 1;

        BYTES: while (1){
            # read c_size bits
            while ($i_bits < $c_size){

                # end of sub-block
                !$n_bytes-- and next SUB_BLOCK;

                $s = substr($data, $pos++, 1);
                my $c = unpack('C', $s);

                $i_buff |= $c << $i_bits;
                $i_bits += 8;
            }

            # write c_size bits
            my $code   = $i_buff & $c_mask;

            my $w_bits = $c_size;
            $i_buff  >>= $c_size;
            $i_bits   -= $c_size;
            $t_size++;

            if ($o_bits > 0){
                $o_buff |= $code >> ($c_size - 8 + $o_bits);
                $w_bits -= 8 - $o_bits;
                $stream .= pack('C', $o_buff & 0xFF);
            }

            if ($w_bits >= 8){
                $w_bits -= 8;
                $stream .= pack('C', ($code >> $w_bits) & 0xFF);
            }

            if (($o_bits = $w_bits) > 0){
                $o_buff = $code << (8 - $o_bits);
            }

            # clear code
            if ($code == 32){
                $c_size   = 6;
                $t_size   = 33;
                $c_mask   = (1 << $c_size) - 1;
            }

            # end code
            if ($code == 33){
                $stream .= pack('C', $o_buff & 0xFF);
                last SUB_BLOCK;
            }

            if ($t_size == (1 << $c_size)){
                if (++$c_size > 12){
                    $c_size--;
                } else {
                    $c_mask = (1 << $c_size) - 1;
                }
            }
        }
    }

    # Pad with zeros to byte boundary
    $stream .= '0' x (8 - length($stream) % 8);

    return $stream;
}

#---------------------------------------------------------------------------

# Decode 5-bit data stream

sub UnLZW5 {
    my $data = shift;

    my $c_size = 6;                 # minimium LZW code size
    my $t_size = 33;                # initial code table size
    my ($i_buff,$i_bits) = (0,0);   # input buffer

    my $stream = '';                # bitstream
    my $pos    = 0;

    # initialize code table
    my @table  = map { chr($_) } 0..$t_size-2;
    $table[32] = '';
    my $prefix = '';
    my $suffix = '';

    # get first code word
    while ($i_bits < $c_size){
        my $d     = unpack('C', substr($data, $pos++, 1));
        $i_buff   = ($i_buff << 8) + $d;
        $i_bits += 8;
    }

    my $c2     = $i_buff >> ($i_bits - $c_size);
    $i_bits   -= $c_size;
    my $c_mask = (1 << $i_bits) - 1;
    $i_buff   &= $c_mask;

    # get remaining code words
    DECOMPRESS: while ($pos < length($data)){
        my $c1 = $c2;

        while ($i_bits < $c_size){
            my $d     = unpack('C', substr($data, $pos++, 1));
            $i_buff   = ($i_buff << 8) + $d;
            $i_bits  += 8;
        }

        $c2      = $i_buff >> ($i_bits - $c_size);
        $i_bits -= $c_size;
        $c_mask  = (1 << $i_bits) - 1;
        $i_buff &= $c_mask;

        # clear code
        if ($c2 == 32){
            $stream  .= $table[$c1];
            $#table   = 32;
            $c_size   = 6;
            $t_size   = 33;
            next DECOMPRESS;
        }

        # end code
        if ($c2 == 33){
            $stream .= $table[$c1];
            last DECOMPRESS;
        }

        # get prefix and suffix
        $prefix = $table[$c1] if $c1 < $t_size;
        $suffix = $c2 < $t_size ? substr($table[$c2], 0, 1) : substr($prefix, 0, 1);

        # write prefix
        $stream .= $prefix;

        # write multiple-character sequence
        $table[$t_size++] = $prefix . $suffix;

        # increase code size
        if ($t_size == 2 ** $c_size){
            if (++$c_size > 12){
                $c_size--;
            }
        }
    }

    return $stream;
}
4

1 回答 1

0

一次做一个很慢。一次完成所有这些会占用您太多的内存。一次做一大块。

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 5.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my @vals = map { pack('B*', "000$_") } unpack('B*', $bytes) =~ /(.{5})/g;

   # Transform @vals into 8 bit values here.

   # Pack to 8 bit fields.
   $out_bytes .= pack('C*', @vals);

}

由于您根本没有转换值(只是它们的存储方式),因此简化为:

my $BUFFER_SIZE = 5 * 50_000;  # Must be a multiple of 40.

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,$BUFFER_SIZE})//s) {
   # Unpack from 5 bit fields.
   my $bits = unpack('B*', $bytes);
   $bits =~ s/(.{5})/000$1/g;
   $out_bytes .= pack('B*', $bits);

}

你没有说如何处理额外的位。我只是忽略了他们。


不创建位串的替代方法:

my $in_bytes = ...;
my $out_bytes = '';
while (my ($bytes) = $in_bytes =~ s/^(.{1,5})//s) {
    my @bytes = map ord, split //, $bytes;

    # 00000111 11222223 33334444 45555566 66677777

    $out_bytes .= chr(                            (($bytes[0] >> 3) & 0x1F));
    last if @bytes == 1;
    $out_bytes .= chr((($bytes[0] << 2) & 0x1C) | (($bytes[1] >> 6) & 0x03));
    $out_bytes .= chr(                            (($bytes[1] >> 1) & 0x1F));
    last if @bytes == 2;
    $out_bytes .= chr((($bytes[1] << 4) & 0x10) | (($bytes[2] >> 4) & 0x0F));
    last if @bytes == 3;
    $out_bytes .= chr((($bytes[2] << 1) & 0x1E) | (($bytes[3] >> 7) & 0x01));
    $out_bytes .= chr(                            (($bytes[3] >> 2) & 0x1F));
    last if @bytes == 4;
    $out_bytes .= chr((($bytes[3] << 3) & 0x18) | (($bytes[4] >> 5) & 0x07));
    $out_bytes .= chr(                            ( $bytes[4]       & 0x1F));
}

上述解决方案的优点是它在 C 语言中特别有效。

STRLEN in_len;
const char* in = SvPVbyte(sv, in_len);

STRLEN out_len = (in_len * 8 / 5) * 8;
char* out = (char*)malloc(out_len);

char* out_cur = out;
char* in_end = in + in_len;

while (in != in_end) {
    *(out_cur++) =                          ((*in >> 3) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 2) & 0x1C) | ((*in >> 6) & 0x03));
    *(out_cur++) =                          ((*in >> 1) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 4) & 0x10) | ((*in >> 4) & 0x0F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 1) & 0x1E) | ((*in >> 7) & 0x01));
    *(out_cur++) =                          ((*in >> 2) & 0x1F));
    if (++in == in_end) break;
    *(out_cur++) = ((in[-1] << 3) & 0x18) | ((*in >> 5) & 0x07));
    *(out_cur++) =                          ( *in       & 0x1F));
}

return newSVpvn(out, out_len);
于 2012-06-29T00:08:14.263 回答