java - 使用 DocumentBuilder 进行 XML 解析

Question

我正在尝试将 xml 解析为键值对映射，如下所示。

示例 xml 文档：

<Students>
    <StudentA>
        <Id>123</Id>
        <Address>123 W </Address>
        <Courses>
            <Course1>CS203</Course1>
            <Course2>CS206</Course2>
        </Courses>
    </StudentA>
    <StudentB>
        <Id>124</Id>
        <Address>124 W </Address>
        <Courses>
            <Course1>CS202</Course1>
            <Course2>CS204</Course2>
        </Courses>
    </StudentB>
</Students>

xml解析器代码：

/**
 * Parse the given xml data.
 * @param xmlString The xml string to be parsed.
 * @return Non-null list of {@link DiscreteDataEntry} values, may be empty.
 */
Map<String, String> parseXML(final String xmlString)
{
    final String xmlDataToParse = xmlString;

    parentNode = "";
    try
    {
        final InputStream inputStream = new ByteArrayInputStream(xmlDataToParse.getBytes());
        final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setNamespaceAware(true);
        final DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        final Document document = documentBuilder.parse(inputStream);
        final Map<String, String> data = createMapOfAttributeValuesKeyedByName(document.getDocumentElement());
    }
    catch (final Exception exception)
    {
        System.out.println("Exception:" + exception);
    }

    return data;
}

/**
 * A recursive method which will loop through all the xml nodes.
 * @param node The node.
 * @return Non-null map of test values keyed by test name, may be empty.
 */
Map<String, String> createMapOfAttributeValuesKeyedByName(final Node node)
{
    final Map<String, String> attributeValuesKeyedByName = new LinkedHashMap<String, String>();
    final NodeList nodeList = node.getChildNodes();
    for (int index = 0; index < nodeList.getLength(); index++)
    {
        final Node currentNode = nodeList.item(index);
        if (node.getFirstChild() != null && node.getFirstChild().getNodeType() == Node.ELEMENT_NODE)
        {
            parentNode = getAncestralOrigin(currentNode);
            attributeValuesKeyedByName.putAll(createMapOfAttributeValuesKeyedByName(currentNode));
        }
        else if (node.getFirstChild() != null && node.getFirstChild().getNodeType() == Node.TEXT_NODE)
        {
            final String attributeName = parentNode.length() > 0 ? parentNode + "." + node.getNodeName().trim() : node.getNodeName().trim();
            final String attributeValue = node.getTextContent().trim();
            attributeValuesKeyedByName.put(attributeName, attributeValue);
            parentNode = "";
        }
    }

    return attributeValuesKeyedByName;
}

/**
 * Parses a give node and finds all its ancestors.
 * @param node The node whose ancestors have to be found.
 * @return A non-null but possible empty string built using the ancestors of the node.
 */
final String getAncestralOrigin(final Node node)
{
    String ancestralOrigin = "";
    final Node currentParentNode = node.getParentNode();
    if (currentParentNode != null && currentParentNode.getNodeType() != Node.DOCUMENT_NODE)
    {
        ancestralOrigin = currentParentNode.getNodeName();
        final String ancestor = getAncestralOrigin(currentParentNode);
        if (ancestor.length() > 0)
        {
            ancestralOrigin = ancestor + "." + ancestralOrigin;
        }
    }
    return ancestralOrigin;
}

地图的输出是：

Key:[Students.StudentA.Id], Value:[123]
Key:[Students.StudentA.Address], Value:[123 W]
Key:[Students.StudentA.Courses.Course1], Value:[CS203]
Key:[Students.StudentA.Courses.Course2], Value:[CS206]
Key:[Students.StudentB.Id], Value:[124]
Key:[Students.StudentB.Address], Value:[124 W]
Key:[Students.StudentB.Courses.Course1], Value:[CS202]
Key:[Students.StudentB.Courses.Course2], Value:[CS204]

但是，如果正在读取文件，则此输出可以正常工作

final BufferedReader bufferedReader = new BufferedReader(new FileReader(new     File(url.getFile().replaceAll("%20", " "))));

如果读取相同的文件

DataInputStream is = new DataInputStream(new FileInputStream(new File(url.getFile().replaceAll("%20", " "))));

输出不同。它确实需要 xml 文档中的所有 CR 和 LF。

关键：[学生]，价值：[123 123 W

我正在使用依赖 jar 来读取使用 DataInputStream 的 xml 文件。

我一直认为我的 xml 解析器会处理 CR/LF/NewLine 看起来不像。在解析之前，我将所有 CR LF 和 NewLines 替换为空字符串。

但我想知道是否有其他 xml 解析器可以自行处理。另外，BufferedReader 跳过 CR/LF 和 NewLine 的原因是什么，但 DataInputStream 不会。

还有没有其他更好的方法来查找子标签的祖先，我需要它们使键值唯一。

xml 将保持原样，无法更改。此外，xml 将与此处显示的不同，它将是标签更改的通用 xml，因此我正在尝试制作一个通用 xml 解析器来解析 xml 子标签并将它们放入地图中。

子标签可以重复，因此我使用子标签的路径使其唯一。

还有一种方法可以通过删除父标签Student来递归地解析这些标签（StudentA/StudentB）的xml。

注意：xml 格式更改，并且我解析的 xml 可能会针对每个 xml 文件而更改。所以我真的不能像得到StudentA的孩子那样解析。

score 0 · Accepted Answer

经过长时间的描述，我了解到，您想知道其他更好的解析 XML 的方法。

答案是，是的，还有其他更好的方法来解析 XML。使用StAXor SAX，这些速度更快，内存效率更高。要了解更多信息JAXP，请阅读 Java 教程。

score 0 · Accepted Answer

DataInputStream旨在仅读取使用DataOutputStream... 即序列化 Java 对象编写的内容。它不适用于阅读文本输入。

java - 使用 DocumentBuilder 进行 XML 解析

2 回答 2

Related

Reference