据我所知,在 String.getBytes(charset) 中,参数 charset 表示该方法返回编码为给定字符集的字符串的字节。
在 new String(bytes, charset) 中,第二个参数 charset 表示该方法将字节解码为给定的字符集并返回解码结果。
根据以上,据我了解,两种不同方法的 charset 参数必须相同,这样 new String(bytes, charset) 才能返回正确的字符串。(我想这就是我所缺少的。)
我有一个错误解码的字符串,我用这个测试了以下代码:
String originalStr = "Å×½ºÆ®"; // 테스트
String [] charSet = {"utf-8","euc-kr","ksc5601","iso-8859-1","x-windows-949"};
for (int i=0; i<charSet.length; i++) {
for (int j=0; j<charSet.length; j++) {
try {
System.out.println("[" + charSet[i] +"," + charSet[j] +"] = " + new String(originalStr.getBytes(charSet[i]), charSet[j]));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
}
输出是:
[utf-8,utf-8] = Å×½ºÆ®
[utf-8,euc-kr] = ��쩍쨘�짰
[utf-8,ksc5601] = ��쩍쨘�짰
[utf-8,iso-8859-1] = Å×½ºÆ®
[utf-8,x-windows-949] = 횇횞쩍쨘횈짰
[euc-kr,utf-8] = ?����������
[euc-kr,euc-kr] = ?×½ºÆ®
[euc-kr,ksc5601] = ?×½ºÆ®
[euc-kr,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[euc-kr,x-windows-949] = ?×½ºÆ®
[ksc5601,utf-8] = ?����������
[ksc5601,euc-kr] = ?×½ºÆ®
[ksc5601,ksc5601] = ?×½ºÆ®
[ksc5601,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[ksc5601,x-windows-949] = ?×½ºÆ®
[iso-8859-1,utf-8] = ��Ʈ
[iso-8859-1,euc-kr] = 테스트
[iso-8859-1,ksc5601] = 테스트
[iso-8859-1,iso-8859-1] = Å×½ºÆ®
[iso-8859-1,x-windows-949] = 테스트
[x-windows-949,utf-8] = ?����������
[x-windows-949,euc-kr] = ?×½ºÆ®
[x-windows-949,ksc5601] = ?×½ºÆ®
[x-windows-949,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[x-windows-949,x-windows-949] = ?×½ºÆ®
如您所见,我弄清楚了获取原始字符串的方法:
[iso-8859-1,euc-kr] = 테스트
[iso-8859-1,ksc5601] = 테스트
[iso-8859-1,x-windows-949] = 테스트
怎么可能?如何将字符串正确编码和解码为不同的字符集?