22

我想使用 Gzip 压缩在 java 中压缩输入流。

假设我们有一个未压缩的输入流(1GB 的数据......)。因此,我想要来自源的压缩输入流:

public InputStream getCompressedStream(InputStream unCompressedStream) {

    // Not working because it's uncompressing the stream, I want the opposite.
    return new GZIPInputStream(unCompressedStream); 

}
4

12 回答 12

16

DeflaterInputStream 不是您想要的,因为它缺少 gzip 标头/预告片并且使用稍微不同的压缩。

如果您从 OutputStream(推)更改为 InputStream(拉),您需要做不同的事情。

GzipOutputStream 的作用是:

  • 写一个静态 gzip 头文件
  • 使用 DeflaterOutputStream 编写一个放气的流。写入流时,从未压缩的数据构建 CRC32 校验和,并计算字节数
  • 写一个包含 CRC32 校验和和字节数的尾部。

如果你想对 InputStreams 做同样的事情,你需要一个包含以下内容的流:

  • 标题
  • 泄气的内容
  • 拖车

最好的方法是提供 3 个不同的流并将它们合并为一个。幸运的是,有 SequenceInputStream 可以为您组合流。

这是我的实现加上一个简单的单元测试:

import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.SequenceInputStream;
import java.util.Enumeration;
import java.util.zip.CRC32;
import java.util.zip.Deflater;
import java.util.zip.DeflaterInputStream;
import java.util.zip.DeflaterOutputStream;

/**
 * @author mwyraz
 * Wraps an input stream and compresses it's contents. Similiar to DeflateInputStream but adds GZIP-header and trailer
 * See GzipOutputStream for details.
 * LICENSE: Free to use. Contains some lines from GzipOutputStream, so oracle's license might apply as well!
 */
public class GzipCompressingInputStream extends SequenceInputStream
{
    public GzipCompressingInputStream(InputStream in) throws IOException
    {
        this(in,512);
    }
    public GzipCompressingInputStream(InputStream in, int bufferSize) throws IOException
    {
        super(new StatefullGzipStreamEnumerator(in,bufferSize));
    }

    static enum StreamState
    {
        HEADER,
        CONTENT,
        TRAILER
    }

    protected static class StatefullGzipStreamEnumerator implements Enumeration<InputStream>
    {

        protected final InputStream in;
        protected final int bufferSize;
        protected StreamState state;

        public StatefullGzipStreamEnumerator(InputStream in, int bufferSize)
        {
            this.in=in;
            this.bufferSize=bufferSize;
            state=StreamState.HEADER;
        }

        public boolean hasMoreElements()
        {
            return state!=null;
        }
        public InputStream nextElement()
        {
            switch (state)
            {
                case HEADER:
                    state=StreamState.CONTENT;
                    return createHeaderStream();
                case CONTENT:
                    state=StreamState.TRAILER;
                    return createContentStream();
                case TRAILER:
                    state=null;
                    return createTrailerStream();
            }
            return null;
        }

        static final int GZIP_MAGIC = 0x8b1f;
        static final byte[] GZIP_HEADER=new byte[] {
                (byte) GZIP_MAGIC,        // Magic number (short)
                (byte)(GZIP_MAGIC >> 8),  // Magic number (short)
                Deflater.DEFLATED,        // Compression method (CM)
                0,                        // Flags (FLG)
                0,                        // Modification time MTIME (int)
                0,                        // Modification time MTIME (int)
                0,                        // Modification time MTIME (int)
                0,                        // Modification time MTIME (int)
                0,                        // Extra flags (XFLG)
                0                         // Operating system (OS)
        };
        protected InputStream createHeaderStream()
        {
            return new ByteArrayInputStream(GZIP_HEADER);
        }
        protected InternalGzipCompressingInputStream contentStream;
        protected InputStream createContentStream()
        {
            contentStream=new InternalGzipCompressingInputStream(new CRC32InputStream(in), bufferSize);
            return contentStream;
        }
        protected InputStream createTrailerStream()
        {
            return new ByteArrayInputStream(contentStream.createTrailer());
        }
    }

    /**
     * Internal stream without header/trailer  
     */
    protected static class CRC32InputStream extends FilterInputStream
    {
        protected CRC32 crc = new CRC32();
        protected long byteCount;
        public CRC32InputStream(InputStream in)
        {
            super(in);
        }

        @Override
        public int read() throws IOException
        {
            int val=super.read();
            if (val>=0)
            {
                crc.update(val);
                byteCount++;
            }
            return val;
        }
        @Override
        public int read(byte[] b, int off, int len) throws IOException
        {
            len=super.read(b, off, len);
            if (len>=0)
            {
                crc.update(b,off,len);
                byteCount+=len;
            }
            return len;
        }
        public long getCrcValue()
        {
            return crc.getValue();
        }
        public long getByteCount()
        {
            return byteCount;
        }
    }

    /**
     * Internal stream without header/trailer  
     */
    protected static class InternalGzipCompressingInputStream extends DeflaterInputStream
    {
        protected final CRC32InputStream crcIn;
        public InternalGzipCompressingInputStream(CRC32InputStream in, int bufferSize)
        {
            super(in, new Deflater(Deflater.DEFAULT_COMPRESSION, true),bufferSize);
            crcIn=in;
        }
        public void close() throws IOException
        {
            if (in != null)
            {
                try
                {
                    def.end();
                    in.close();
                }
                finally
                {
                    in = null;
                }
            }
        }

        protected final static int TRAILER_SIZE = 8;

        public byte[] createTrailer()
        {
            byte[] trailer= new byte[TRAILER_SIZE];
            writeTrailer(trailer, 0);
            return trailer;
        }

        /*
         * Writes GZIP member trailer to a byte array, starting at a given
         * offset.
         */
        private void writeTrailer(byte[] buf, int offset)
        {
            writeInt((int)crcIn.getCrcValue(), buf, offset); // CRC-32 of uncompr. data
            writeInt((int)crcIn.getByteCount(), buf, offset + 4); // Number of uncompr. bytes
        }

        /*
         * Writes integer in Intel byte order to a byte array, starting at a
         * given offset.
         */
        private void writeInt(int i, byte[] buf, int offset)
        {
            writeShort(i & 0xffff, buf, offset);
            writeShort((i >> 16) & 0xffff, buf, offset + 2);
        }

        /*
         * Writes short integer in Intel byte order to a byte array, starting
         * at a given offset
         */
        private void writeShort(int s, byte[] buf, int offset)
        {
            buf[offset] = (byte)(s & 0xff);
            buf[offset + 1] = (byte)((s >> 8) & 0xff);
        }
    }

}

import static org.junit.Assert.*;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import java.util.zip.CRC32;
import java.util.zip.GZIPInputStream;

import org.junit.Test;

public class TestGzipCompressingInputStream
{

    @Test
    public void test() throws Exception
    {
        testCompressor("test1 test2 test3");
        testCompressor("1MB binary data",createTestPattern(1024*1024));
        for (int i=0;i<4096;i++)
        {
            testCompressor(i+" bytes of binary data",createTestPattern(i));
        }
    }

    protected byte[] createTestPattern(int size)
    {
        byte[] data=new byte[size];
        byte pattern=0;
        for (int i=0;i<size;i++)
        {
            data[i]=pattern++;
        }
        return data;
    }

    protected void testCompressor(String data) throws IOException
    {
        testCompressor("String: "+data,data.getBytes());
    }
    protected void testCompressor(String dataInfo, byte[] data) throws IOException
    {
        InputStream uncompressedIn=new ByteArrayInputStream(data);
        InputStream compressedIn=new GzipCompressingInputStream(uncompressedIn);
        InputStream uncompressedOut=new GZIPInputStream(compressedIn);

        byte[] result=StreamHelper.readBinaryStream(uncompressedOut);

        assertTrue("Test failed for: "+dataInfo,Arrays.equals(data,result));

    }

}
于 2013-10-11T20:13:03.607 回答
4

在流行的开源 ESB Mule中可以找到压缩输入流的工作示例:GZIPCompressorInputStream.

它使用DeflaterInputStreamJRE 提供的压缩包,预先添加gzip标头并附加 gzip 预告片(也称为页脚)。

不幸的是,它在CPA License下,这似乎并不常见。此外,似乎没有单元测试。

于 2013-12-12T15:14:29.333 回答
4

如果您不想将内容加载到大字节数组中并且需要真正的流式解决方案:

package x.y.z;

import org.apache.commons.io.IOUtils;

import java.io.*;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.ZipOutputStream;

/**
 * Stream Compression Utility
 *
 * @author Thamme Gowda N
 */
public enum CompressionUtil {
    INSTANCE;

    public static final int NUM_THREADS = 5;
    private final ExecutorService pool;

    CompressionUtil(){
        this.pool = Executors.newFixedThreadPool(NUM_THREADS);
    }

    public static CompressionUtil getInstance(){
        return INSTANCE;
    }

    /**
     * Supported compression type names
     */
    public static enum CompressionType {
        GZIP,
        ZIP
    }

    /**
     * Wraps the given stream in a Compressor stream based on given type
     * @param sourceStream : Stream to be wrapped
     * @param type         : Compression type
     * @return source stream wrapped in a compressor stream
     * @throws IOException when some thing bad happens
     */
    public static OutputStream getCompressionWrapper(OutputStream sourceStream,
                                     CompressionType type) throws IOException {

        switch (type) {
            case GZIP:
                return new GZIPOutputStream(sourceStream);
            case ZIP:
                return new ZipOutputStream(sourceStream);
            default:
                throw new IllegalArgumentException("Possible values :"
                        + Arrays.toString(CompressionType.values()));
        }
    }

    /**
     * Gets Compressed Stream for given input Stream
     * @param sourceStream  : Input Stream to be compressed to
     * @param type: Compression types such as GZIP
     * @return  Compressed Stream
     * @throws IOException when some thing bad happens
     */
    public static InputStream getCompressedStream(final InputStream sourceStream,
                                    CompressionType type ) throws IOException {

        if(sourceStream == null) {
            throw new IllegalArgumentException("Source Stream cannot be NULL");
        }

        /**
         *  sourceStream --> zipperOutStream(->intermediateStream -)--> resultStream
         */
        final PipedInputStream resultStream = new PipedInputStream();
        final PipedOutputStream intermediateStream = new PipedOutputStream(resultStream);
        final OutputStream zipperOutStream = getCompressionWrapper(intermediateStream, type);

        Runnable copyTask = new Runnable() {

            @Override
            public void run() {
                try {
                    int c;
                    while((c = sourceStream.read()) >= 0) {
                        zipperOutStream.write(c);
                    }
                    zipperOutStream.flush();
                } catch (IOException e) {
                    IOUtils.closeQuietly(resultStream);  // close it on error case only
                    throw new RuntimeException(e);
                } finally {
                    // close source stream and intermediate streams
                    IOUtils.closeQuietly(sourceStream);
                    IOUtils.closeQuietly(zipperOutStream);
                    IOUtils.closeQuietly(intermediateStream);
                }
            }
        };
        getInstance().pool.submit(copyTask);
        return resultStream;
    }

    public static void main(String[] args) throws IOException {
        String input = "abcdefghij";
        InputStream sourceStream = new ByteArrayInputStream(input.getBytes());
        InputStream compressedStream =
                getCompressedStream(sourceStream, CompressionType.GZIP);

        GZIPInputStream decompressedStream = new GZIPInputStream(compressedStream);
        List<String> lines = IOUtils.readLines(decompressedStream);
        String output = lines.get(0);
        System.out.println("test passed ? " + input.equals(output));

    }
}
于 2014-07-10T09:42:45.857 回答
4

这是我编写的一个版本,其中没有 CRC/GZIP Magic cookie,因为它委托给 GZIPOutputStream。它还具有内存效率,因为它只使用足够的内存来缓冲压缩(42MB 文件使用 45k 缓冲区)。性能与压缩到内存相同。

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPOutputStream;

/**
 * Compresses an InputStream in a memory-optimal, on-demand way only compressing enough to fill a buffer.
 * 
 * @author Ben La Monica
 */
public class GZIPCompressingInputStream extends InputStream {

    private InputStream in;
    private GZIPOutputStream gz;
    private OutputStream delegate;
    private byte[] buf = new byte[8192];
    private byte[] readBuf = new byte[8192];
    int read = 0;
    int write = 0;

    public GZIPCompressingInputStream(InputStream in) throws IOException {
        this.in = in;
        this.delegate = new OutputStream() {

            private void growBufferIfNeeded(int len) {
                if ((write + len) >= buf.length) {
                    // grow the array if we don't have enough space to fulfill the incoming data
                    byte[] newbuf = new byte[(buf.length + len) * 2];
                    System.arraycopy(buf, 0, newbuf, 0, buf.length);
                    buf = newbuf;
                }
            }

            @Override
            public void write(byte[] b, int off, int len) throws IOException {
                growBufferIfNeeded(len);
                System.arraycopy(b, off, buf, write, len);
                write += len;
            }

            @Override
            public void write(int b) throws IOException {
                growBufferIfNeeded(1);
                buf[write++] = (byte) b;
            }
        };
        this.gz = new GZIPOutputStream(delegate); 
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        compressStream();
        int numBytes = Math.min(len, write-read);
        if (numBytes > 0) {
            System.arraycopy(buf, read, b, off, numBytes);
            read += numBytes;
        } else if (len > 0) {
            // if bytes were requested, but we have none, then we're at the end of the stream
            return -1;
        }
        return numBytes;
    }

    private void compressStream() throws IOException {
        // if the reader has caught up with the writer, then zero the positions out
        if (read == write) {
            read = 0;
            write = 0;
        }

        while (write == 0) {
            // feed the gzip stream data until it spits out a block
            int val = in.read(readBuf);
            if (val == -1) {
                // nothing left to do, we've hit the end of the stream. finalize and break out
                gz.close();
                break;
            } else if (val > 0) {
                gz.write(readBuf, 0, val);
            }
        }
    }

    @Override
    public int read() throws IOException {
        compressStream();
        if (write == 0) {
            // write should not be 0 if we were able to get data from compress stream, must mean we're at the end
            return -1;
        } else {
            // reading a single byte
            return buf[read++] & 0xFF;
        }
    }
}
于 2020-03-31T13:55:42.110 回答
3

看来我迟到了 3 年,但也许对某人有用。我的解决方案类似于@Michael Wyraz 的解决方案,唯一的区别是我的解决方案是基于FilterInputStream

import java.io.ByteArrayInputStream;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.CRC32;
import java.util.zip.Deflater;

public class GZipInputStreamDeflater extends FilterInputStream {

    private static enum Stage {
        HEADER,
        DATA,
        FINALIZATION,
        TRAILER,
        FINISH
    }

    private GZipInputStreamDeflater.Stage stage = Stage.HEADER;

    private final Deflater deflater = new Deflater( Deflater.DEFLATED, true );
    private final CRC32 crc = new CRC32();

    /* GZIP header magic number */
    private final static int GZIP_MAGIC = 0x8b1f;

    private ByteArrayInputStream trailer = null;
    private ByteArrayInputStream header = new ByteArrayInputStream( new byte[] {
        (byte) GZIP_MAGIC, // Magic number (short)
        (byte) ( GZIP_MAGIC >> 8 ), // Magic number (short)
        Deflater.DEFLATED, // Compression method (CM)
        0, // Flags (FLG)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Extra flags (XFLG)
        0, // Operating system (OS)
    } );

    public GZipInputStreamDeflater(InputStream in) {
        super( in );
        crc.reset();
    }

    @Override
    public int read( byte[] b, int off, int len ) throws IOException {
        int read = -1;

        switch( stage ) {
            case FINISH:
                return -1;
            case HEADER:
                read = header.read( b, off, len );
                if( header.available() == 0 ) {
                    stage = Stage.DATA;
                }
                return read;
            case DATA:
                byte[] b2 = new byte[len];
                read = super.read( b2, 0, len );
                if( read <= 0 ) {
                    stage = Stage.FINALIZATION;
                    deflater.finish();
                    return 0;
                }
                else {
                    deflater.setInput( b2, 0, read );
                    crc.update( b2, 0, read );
                    read = 0;
                    while( !deflater.needsInput() && len - read > 0 ) {
                        read += deflater.deflate( b, off + read, len - read, Deflater.NO_FLUSH );
                    }
                    return read;
                }
            case FINALIZATION:
                if( deflater.finished() ) {
                    stage = Stage.TRAILER;

                    int crcVaue = (int) crc.getValue();
                    int totalIn = deflater.getTotalIn();

                    trailer = new ByteArrayInputStream( new byte[] {
                        (byte) ( crcVaue >> 0 ),
                        (byte) ( crcVaue >> 8 ),
                        (byte) ( crcVaue >> 16 ),
                        (byte) ( crcVaue >> 24 ),

                        (byte) ( totalIn >> 0 ),
                        (byte) ( totalIn >> 8 ),
                        (byte) ( totalIn >> 16 ),
                        (byte) ( totalIn >> 24 ),
                    } );

                    return 0;
                }
                else {
                    read = deflater.deflate( b, off, len, Deflater.FULL_FLUSH );
                    return read;
                }
            case TRAILER:
                read = trailer.read( b, off, len );
                if( trailer.available() == 0 ) {
                    stage = Stage.FINISH;
                }
                return read;
        }
        return -1;
    }

    @Override
    public void close( ) throws IOException {
        super.close();
        deflater.end();
        if( trailer != null ) {
            trailer.close();
        }
        header.close();
    }
}

用法:

AmazonS3Client s3client = new AmazonS3Client( ... );
try ( InputStream in = new GZipInputStreamDeflater( new URL( "http://....../very-big-file.csv" ).openStream() ); ) {
    PutObjectRequest putRequest = new PutObjectRequest( "BUCKET-NAME", "/object/key", in, new ObjectMetadata() );
    s3client.putObject( putRequest );
}
于 2015-01-21T16:46:07.627 回答
3

PipedOutputStream允许您写入 GZIPOutputStream 并通过 InputStream 公开该数据。与将整个数据流缓冲到数组或文件的其他解决方案不同,它具有固定的内存成本。唯一的问题是你不能从同一个线程读取和写入,你必须使用一个单独的线程。

private InputStream gzipInputStream(InputStream in) throws IOException {
    PipedInputStream zipped = new PipedInputStream();
    PipedOutputStream pipe = new PipedOutputStream(zipped);
    new Thread(
            () -> {
                try(OutputStream zipper = new GZIPOutputStream(pipe)){
                    IOUtils.copy(in, zipper);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
    ).start();
    return zipped;
}
于 2017-09-08T15:18:41.597 回答
2

要压缩数据,您需要GZIPOutputStream. 但是由于您需要像从 InputStream 中一样读取数据,因此您需要将 OutputStream 转换为 InputStream。您可以使用 getBytes() 这样做:

GZIPOutputStream gout = new GZIPOutputStream(out);
//... Code to read from your original uncompressed data and write to out.

//Convert to InputStream.
new ByteArrayInputStream(gout.getBytes());

但是这种方法有一个限制,即您需要首先读取所有数据 - 这意味着您必须有足够的内存来保存该缓冲区。

此线程中提到了使用管道的替代方法 -如何将 OutputStream 转换为 InputStream?

于 2012-06-14T15:35:26.960 回答
2

JRE中没有DeflatingGZIPInputStream。要使用“deflate”压缩格式进行放气,请使用java.util.zip.DeflaterInputStreamand java.util.zip.DeflaterOutputStream

public InputStream getCompressedStream(InputStream unCompressedStream) {
    return new DeflaterInputStream(unCompressedStream); 
}

您可以java.util.zip.DeflaterInputStream通过查看java.util.zip.GZIPOutputStream.

于 2012-06-14T15:43:39.173 回答
1

你不应该GZIPOutputStream在那种情况下看吗?

public OutputStream getCompressedStream(InputStream input) {
    OutputStream output = new GZIPOutputStream(new ByteArrayOutputStream()); 
    IOUtils.copy(input, output);
    return output;
}
于 2012-06-14T15:29:36.887 回答
1

您可以使用EasyStream

try(final InputStreamFromOutputStream<Void> isOs = new InputStreamFromOutputStream<Void>() {
    @Override
    protected void produce(final OutputStream dataSink) throws Exception {
        InputStream in = new GZIPInputStream(unCompressedStream);
        IOUtils.copy(in, dataSink);
    }
}) {        

    //You can use the compressed input stream here

} catch (final IOException e) {
    //Handle exceptions here
} 
于 2015-09-23T12:56:21.183 回答
0
public InputStream getCompressed( InputStream is ) throws IOException
{
    byte data[] = new byte[2048];
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    GzipOutputStream zos = new GzipOutputStream( bos );
    BufferedInputStream entryStream = new BufferedInputStream( is, 2048);
    int count;
    while ( ( count = entryStream.read( data, 0, 2048) ) != -1 )
    {
        zos.write( data, 0, count );
    }
    entryStream.close();
    zos.close();

    return new ByteArrayInputStream( bos.toByteArray() );
}

参考 : zip 压缩

于 2020-01-08T02:14:16.713 回答
-3

我建议使用来自Apache Commons Compress的GzipCompressorInputStream

于 2016-08-02T14:21:24.113 回答