java - Java：将 UTF 8 转换为字符串

Question

当我运行以下程序时：

public static void main(String args[]) throws Exception
{
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8");
}

在 Linux 上检查 jdb 中 s 的值，我正确地得到：

 s = "ì–´"

在 Windows 上，我错误地得到：

s = "?"

我的字节序列在韩语中是一个有效的 UTF-8 字符，为什么它会产生两个截然不同的结果？

score 3 · Accepted Answer

3

于 2012-10-02T21:22:16.447 回答

score 1 · Accepted Answer

1

于 2012-10-02T21:20:11.713 回答

score 1 · Accepted Answer

You get the correct string, it's Windows console that does not display the string correctly.

Here is a link to an article that discusses a way to make Java console produce correct Unicode output using JNI.

score 0 · Accepted Answer

JDB is displaying the data incorrectly. The code works the same on both Windows and Linux. Try running this more definitive test:

public static void main(String[] args) throws Exception {
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8"); 
    for(int i=0; i<s.length(); i++) {
        System.out.println(BigInteger.valueOf((int)s.charAt(i)).toString(16));
    }
}

This prints out the hex value of every character in the string. This will correctly print out "c5b4" in both Windows and Linux.

java - Java：将 UTF 8 转换为字符串

4 回答 4

Related

Reference