java - MalformedByteSequenceException：1 字节 UTF-8 序列的字节 1 无效。使用希伯来字符时

Question

我正在尝试解析包含希伯来字符的 XML 文件。我知道该文件是正确的，因为如果我输出没有希伯来字符的文件（来自不同的软件），它解析得很好。

我尝试了很多东西，但我总是得到这个错误

MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

我最近的尝试是使用打开它FileInputStream并指定编码

DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(new FileInputStream(new File(xmlFileName)), "Cp1252");

（Cp1252是在不同的应用程序中为我工作的编码）但我得到了相同的结果。

也尝试使用ByteArray，没有任何效果。

有什么建议么？

score 7 · Accepted Answer

如果您知道文件的正确编码并且它不是“utf-8”，那么您可以将其添加到 xml 标头中：

<?xml version="1.0" encoding="[correct encoding here]" ?>

或将其解析为阅读器：

db.parse(new InputStreamReader(new FileInputStream(new File(xmlFileName)), "[correct encoding here]"));

score 1 · Accepted Answer

解决方案很简单，获取 UTF-8 格式的内容，并覆盖 SAX 输入源。

File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");

InputSource is = new InputSource(reader);
// is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog

saxParser.parse(is, handler);

你可以在这里阅读完整的例子——http: //www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/

java - MalformedByteSequenceException：1 字节 UTF-8 序列的字节 1 无效。使用希伯来字符时

2 回答 2

Related

Reference