java - GZIPInputStream 和字符集

Question

我有一个包含拉丁文、西里尔文和中文字符的文本。我尝试使用 GZIPInputStream压缩字符串（over bytes[]）并解压缩它。GZIPOutputStream但我无法将所有字符转换回原始字符。有些显示为?.

我认为 UTF-16 会完成这项工作。

有什么帮助吗？

问候

这是我的代码：

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Inflater;
import java.util.zip.ZipException;

public class CompressUncompressStrings {

    public static void main(String[] args) throws UnsupportedEncodingException {

        String sTestString="äöüäöü 长安";
        System.out.println(sTestString);
        byte bcompressed[]=compress(sTestString.getBytes("UTF-16"));
        //byte bcompressed[]=compress(sTestString.getBytes());
        String sDecompressed=decompress(bcompressed);
        System.out.println(sDecompressed);
    }
    public static byte[] compress(byte[] content){
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        try{
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
            gzipOutputStream.write(content);
            gzipOutputStream.close();
        } catch(IOException e){
            throw new RuntimeException(e);
        }
        return byteArrayOutputStream.toByteArray();
    }
    public static String decompress(byte[] contentBytes){

        String sReturn="";
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        try{
            GZIPInputStream gzipInputStream =new GZIPInputStream(new ByteArrayInputStream(contentBytes));
             ByteArrayOutputStream baos = new ByteArrayOutputStream();
             for (int value = 0; value != -1;) {
                 value = gzipInputStream.read();
                 if (value != -1) {
                     baos.write(value);
                 }
             }
             gzipInputStream.close();
             baos.close();
             sReturn=new String(baos.toByteArray(), "UTF-16");
             return sReturn;
                 // Ende Neu

        } catch(IOException e){
            throw new RuntimeException(e);
        }
    }
}

score 1 · Accepted Answer

我怀疑只是控制台有问题。我尝试了上面的代码，虽然它没有正确打印出任何字符，但当我测试字符串的往返时，它很好：

System.out.println(sDecompressed.equals(sTestString)); // Prints true

这在你的机器上有什么作用？

score 1 · Accepted Answer

Displaying an non ASCII character on a console output is not easy. Assuming you're using Windows as your operating system (since the command line doesn't support Unicode by default), you can change your active code page number (using the chcp command). I don't know how it's done through code but I suggest running the code on command line.

This chcp value 65001 changes to tell windows to use UTF-8 on it's console (you can view a discussion here).

I hope this helps.

java - GZIPInputStream 和字符集

2 回答 2

Related

Reference