java - 如何轻松地在字节数组中压缩和解压缩字符串？

Question

我有一些字符串，每个字符串大约有 10K 个字符。其中有很多重复。它们是序列化的 JSON 对象。我想轻松地将它们压缩成一个字节数组，然后从一个字节数组中解压缩它们。

我怎样才能最容易地做到这一点？我正在寻找方法，以便可以执行以下操作：

String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);

score 27 · Accepted Answer

你可以试试

enum StringCompressor {
    ;
    public static byte[] compress(String text) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            OutputStream out = new DeflaterOutputStream(baos);
            out.write(text.getBytes("UTF-8"));
            out.close();
        } catch (IOException e) {
            throw new AssertionError(e);
        }
        return baos.toByteArray();
    }

    public static String decompress(byte[] bytes) {
        InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            byte[] buffer = new byte[8192];
            int len;
            while((len = in.read(buffer))>0)
                baos.write(buffer, 0, len);
            return new String(baos.toByteArray(), "UTF-8");
        } catch (IOException e) {
            throw new AssertionError(e);
        }
    }
}

score 3 · Accepted Answer

使用这个不太复杂的解压缩函数代码可以稍微改进彼得劳里的答案

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try {
        OutputStream out = new InflaterOutputStream(baos);
        out.write(bytes);
        out.close();
        return new String(baos.toByteArray(), "UTF-8");
    } catch (IOException e) {
        throw new AssertionError(e);
    }

score 1 · Accepted Answer

我制作了一个库来解决压缩通用字符串（特别是短字符串）的问题。它尝试使用各种算法（纯 utf-8、5 位拉丁字母编码、huffman 编码、gzip 用于长字符串）压缩字符串，并选择结果最短的算法（在最坏的情况下，它将选择 utf-8编码，这样您就永远不会冒丢失空间的风险）。

我希望它可能有用，这是链接 https://github.com/lithedream/lithestring

编辑：我意识到你的字符串总是“长”的，我的库默认为这些大小的 gzip，我担心我不能为你做得更好。

java - 如何轻松地在字节数组中压缩和解压缩字符串？

3 回答 3

Related

Reference