java - Java：将 setDictionary 用于 GZIPOutputStream 的 Deflater 时出现 CRC 错误

Question

我正在尝试从标准输入获取数据流，一次压缩一个 128 字节块，然后将其输出到标准输出。（例如：“cat file.txt | java Dict | gzip -d | cmp file.txt”，其中 file.txt 只包含一些 ASCII 字符。）

对于每个后续块，我还需要使用取自前一个 128 字节块末尾的 32 字节字典。（第一个块使用它自己的前 32 个字节作为它的字典。）当我根本不设置字典时，压缩工作正常。但是，当我设置字典时，gzip 给我一个尝试解压缩数据的错误：“gzip：stdin：无效压缩数据--crc 错误”。

我已经尝试添加/更改代码的几个部分，但到目前为止没有任何效果，而且我没有任何运气找到谷歌的解决方案。

我试过了...

在代码底部附近的“def.setDictionary(b)”之前添加“def.reset()”不起作用。
仅在第一个块之后为块设置字典不起作用。（第一个块不使用字典。）
在compressor.write(input, 0, bytesRead) 之前或之后使用“input”数组调用updateCRC 不起作用。

我真的很感激任何建议 - 有什么明显的我遗漏或做错了吗？

这就是我的 Dict.java 文件中的内容：

import java.io.*;
import java.util.zip.GZIPOutputStream;

public class Dict {
  protected static final int BLOCK_SIZE = 128;
  protected static final int DICT_SIZE = 32;

  public static void main(String[] args) {
    InputStream stdinBytes = System.in;
    byte[] input = new byte[BLOCK_SIZE];
    byte[] dict = new byte[DICT_SIZE];
    int bytesRead = 0;

    try {
        DictGZIPOuputStream compressor = new DictGZIPOuputStream(System.out);
        bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        if (bytesRead >= DICT_SIZE) {
            System.arraycopy(input, 0, dict, 0, DICT_SIZE);
            compressor.setDictionary(dict);
        }

        do {
            compressor.write(input, 0, bytesRead);
            compressor.flush();

            if (bytesRead == BLOCK_SIZE) {
                System.arraycopy(input, BLOCK_SIZE-DICT_SIZE-1, dict, 0, DICT_SIZE);
                compressor.setDictionary(dict);
            }
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
        } while (bytesRead > 0);

        compressor.finish();
    }
    catch (IOException e) {e.printStackTrace();}
  }

  public static class DictGZIPOuputStream extends GZIPOutputStream {
    public DictGZIPOuputStream(OutputStream out) throws IOException {
        super(out);
    }

    public void setDictionary(byte[] b) {
        def.setDictionary(b);
    }
    public void updateCRC(byte[] input) {
        crc.update(input);
    }
  }
}

score 1 · Accepted Answer

我不确切知道 zlib 算法的内部工作原理，但根据我对的理解DictGZIPOutputStream，当您调用 write() 方法时，在写入后，它将更新该字节数组的 crc。因此，如果您再次updateCRC()在代码中再次调用，那么由于 crc 更新了两次，事情就会出错。然后执行 gzip -d 时，由于前两次 crc 更新，gzip 会报错“invalid compressed data--crc error”

我还注意到您在使用后没有关闭压缩机。当我执行上面粘贴的代码时，它给出了错误“gzip：stdin：unexpected end of file”。所以一定要确保最后调用了flush方法和close方法。话虽如此，我有以下几点，

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPOutputStream;


public class Dict
{
    protected static final int BLOCK_SIZE = 128;
    protected static final int DICT_DIZE = 32;

    public static void main(String[] args)
    {
        InputStream stdinBytes = System.in;
        byte[] input = new byte[BLOCK_SIZE];
        byte[] dict = new byte[DICT_DIZE];
        int bytesRead = 0;

        try
        {
            DictGZIPOutputStream compressor = new DictGZIPOutputStream(System.out);
            bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);

            if (bytesRead >= DICT_DIZE)
            {
                System.arraycopy(input, 0, dict, 0, DICT_DIZE);
            }

            do 
            {               
                compressor.write(input, 0, bytesRead);              

                if (bytesRead == BLOCK_SIZE)
                {
                    System.arraycopy(input, BLOCK_SIZE-1, dict, 0, DICT_DIZE);
                    compressor.setDictionary(dict);
                }

                bytesRead = stdinBytes.read(input, 0, BLOCK_SIZE);
            }
            while (bytesRead > 0);
            compressor.flush();         
            compressor.close();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

    }

    public static class DictGZIPOutputStream extends GZIPOutputStream
    {

        public DictGZIPOutputStream(OutputStream out) throws IOException
        {
            super(out);
        }

        public void setDictionary(byte[] b)
        {
            def.setDictionary(b);
        }

        public void updateCRC(byte[] input)
        {
            crc.update(input);
        }                       
    }

}

控制台上的测试结果。

$ cat file.txt 
hello world, how are you?1e3djw
hello world, how are you?1e3djw adfa asdfas

$ cat file.txt | java Dict | gzip -d | cmp file.txt ; echo $?
0

java - Java：将 setDictionary 用于 GZIPOutputStream 的 Deflater 时出现 CRC 错误

1 回答 1

Related

Reference