老实说,这令人沮丧,因为我认为我知道原因,但同时我无法确定它何时在我的代码中发生。基本上,对于这个分配,我们应该读入一个输入流,将它分成 128 个字节的块,然后压缩每个块,同时使用前一个块的最后 32 个字节作为字典。
import java.io.*;
import java.util.zip.*;
public class TestCase
{
protected static final int BLOCK_SIZE = 128;
protected static final int DICT_SIZE = 32;
public static void main(String[] args)
{
BufferedInputStream inBytes = new BufferedInputStream(System.in);
byte[] buff = new byte[BLOCK_SIZE];
byte[] dict = new byte[DICT_SIZE];
int bytesRead = 0;
try
{
DGZIPOutputStream compressor = new DGZIPOutputStream(System.out);
bytesRead = inBytes.read(buff);
if (bytesRead >= DICT_SIZE)
{
System.arraycopy(buff, 0, dict, 0, DICT_SIZE);
}
while(bytesRead != -1)
{
compressor.write(buff, 0, bytesRead);
if (bytesRead == BLOCK_SIZE)
{
System.arraycopy(buff, BLOCK_SIZE-DICT_SIZE, dict, 0, DICT_SIZE);
compressor.setDictionary(dict);
}
bytesRead = inBytes.read(buff);
}
compressor.flush();
compressor.close();
}
catch (IOException e)
{
e.printStackTrace();
System.exit(-1);
}
}
public static class DGZIPOutputStream extends GZIPOutputStream
{
public DGZIPOutputStream(OutputStream out) throws IOException
{
super(out);
}
public void setDictionary(byte[] b)
{
def.setDictionary(b);
}
public void updateCRC(byte[] input)
{
crc.update(input);
System.out.println("Called!");
}
}
}
我真的偏离了一个字节。我认为是当我调用 write() 时,我知道它会更新字节数组的 crc。我认为由于某种原因 updateCRC 被调用了两次,但我一生都无法弄清楚在哪里。或者也许我完全离开了。但这是一个单字节,但当我取下字典时,它工作得很好,所以....我真的不确定。
编辑:所以当我编译和测试它时:
$cat file.txt
hello world, how are you? 123efd4
KEYBOARDSMASHR#@Q)KF@_{KFSKFDS
000000000000000000000000000000000000000000000000000
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
pwfprejgewojgw
12345678901234567890
!@#$%^&*(!@#$%^&*(A
cat file.txt | java TestCase | gzip -d | cmp file.txt ; echo $?
gzip: stdin: invalid compressed data--crc error
file.txt - differ: byte 1, line 1
1
(忽略我选择的文件,我昨晚困了)
编辑:解决