可能重复:
解析 UTF-8 编码的 XML 文件
我正在解析一个包含一些阿拉伯字符的 UTF-8 编码 XML 文件,除了不显示阿拉伯字符外,其他一切都正常工作,一些奇怪的字符显示如下:
ÙرÙÙ
这是解析中的 XML“ http://212.12.165.44:7201/UniNews121.xml ”文件的链接
下面是代码
public String getXmlFromUrl(String url) {
try {
return new AsyncTask<String, Void, String>() {
@Override
protected String doInBackground(String... params) {
//String xml = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
httpClient.getParams().setParameter(CoreProtocolPNames.HTTP_CONTENT_CHARSET,"UTF-8");
HttpGet httpPost = new HttpGet(params[0]);
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");
} catch (Exception e) {
e.printStackTrace();
}
//just to remove the BOM Element
xml=xml.substring(3);
//Here am printing the xml and the arabic chars are malformed
Log.i("DEMO", xml);
return xml;
}
}.execute(url).get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return xml;
}
请注意,没有发生错误,一切正常,只是阿拉伯字符格式错误。
感谢您的帮助,但请在您的回答中具体说明