java - java: how to apply the right charset to java string

Question

i have two chinese words "果然有问题", which is encoded in GB3212. However, it is wrongly encoded into ANSI and become "彆衄恀枙". With hex editor, it is B9FBC8BBD3D0CECACCE2

I would like to use java to correct the charset and store it to UTF-16.

So, I try to:

            String wrongstr = "彆衄恀枙";
            byte[] binary = wrongstr.getBytes("BIG5");
            System.out.printf("%s", new String(binary, "GB2312"));

but what I get is: 果?有问题</p>

I used Notepad++ and can see the correct word. I cannot use getBytes("BIG5") or getBytes("US_ANSI") or r.getBytes("GB2312") to get the correct hex (B9FBC8BBD3D0CECACCE2)

enter image description here

I dont know what is wrong. Please help, thanks in advance.

score 0 · Accepted Answer

我认为那不是 ANSI，它没有中文字符。然而，Big5 确实：

String wrongstr = "湖馱";
byte[] binary = wrongstr.getBytes("Big5");
System.out.printf("%s", new String(binary, "GB2312"));

所以这个代码片段给出了你想要的结果。

java - java: how to apply the right charset to java string

1 回答 1

Related

Reference