java - 使用 stax2 XMLInputFactory2 时如何转换输入流？

Question

使用 stax2 解析大 xml 文件（500 - 800Mb），如下所示：

XMLStreamReader2 reader = (XMLStreamReader2) xmlif2.createXMLStreamReader(fileName, new FileInputStream(fileName));

将其转换为特定的 csv 并有下一个问题。一些文本节点包含“” 序列。在输出文件中，它必须替换为西里尔字母“Ё”。但是当解析器发现该序列“”时，它会抛出异常：

[com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxParsingException：非法字符实体：扩展字符（代码 0x1 在 [row,col,system-id]

在 stax 我有同样的例外。

我可以为 xml 流阅读器设置一些转换并替换自动解析？？？我可以创建中间文件，所有文件都被替换，然后解析它，但这不是一个好主意

score 0 · Accepted Answer

Error message indicates that your XML is not well-formed: either its encoding is broken. It sounds like it contains an entity reference for Unicode character with value 0x1. This is not allowed for XML 1.0; although it would be legal for XML 1.1. But perhaps XML document does not declare "version='1.0'" in its xml declaration?

java - 使用 stax2 XMLInputFactory2 时如何转换输入流？

1 回答 1

Related

Reference