xml - XML 1 字节 UTF-8 序列的无效字节 1

Question

我有一个程序，它需要两个 xml 文件并合并为一个，当我这样做时，我设法从“and”转换为“and '”。不谈论我为什么要这样做，这是代码片段，删除“ ’”错误不再存在，这就是我将其粘贴在这里的原因。

convertedString = replace(convertedString, (String)"and ", 
                (String)"and &#8217;");
convertedString = replace(convertedString, (String)"&quot;", 
                (String)"\\\"");
convertedString = StringEscapeUtils.unescapeHtml(convertedString);

使用 printDocument 方法：

private static void printDocument(Document doc, OutputStream out) 
    throws IOException, TransformerException 
    {     
        TransformerFactory tf = TransformerFactory.newInstance();     
        Transformer transformer = tf.newTransformer();     
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");     
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");     
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");     
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");     
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-" +
                "amount", "4");      
        transformer.transform(new DOMSource(doc),           
                new StreamResult(new OutputStreamWriter(out, "UTF-8"))); 
    }

运行我的程序我得到

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)

即使它可能与 printDocument() 方法中的 UTF-8 有关，将其更改为 ISO-8859-1 也无济于事。

那么有人可以帮我解决问题吗？非常感激

score 1 · Accepted Answer

如果您使用的是日食。尝试导航到首选项/常规/工作区。然后将“文本文件编码”更改为UTF-8

score 0 · Accepted Answer

0

XML 解析器试图将输入解释为 UTF-8，但它不是 UTF-8。

于 2012-08-31T12:10:10.003 回答

xml - XML 1 字节 UTF-8 序列的无效字节 1

2 回答 2

Related

Reference