java - Xml解析获取标签之间的数据以及父信息

Question

我正在尝试编写一个通用的 xml 解析器来解析所有 xml 标记并将数据及其值作为键值对获取到映射中。示例 xml：

<?xml version="1.0"?>
<company>
    <staff>
        <firstname>Kevin</firstname>
        <lastname>Gay</lastname>
        <salary>50000</salary>
    </staff>
</company>

输出如下： NodeName:[company] Value:[

        Kevin
        Gay
        50000

]
NodeName:[staff] Value:[
    Kevin
    Gay
    50000
]
NodeName:[firstname] Value:[Kevin]
NodeName:[lastname] Value:[Gay]
NodeName:[salary] Value:[50000]

我的代码如下：

    final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    final DocumentBuilder db = dbf.newDocumentBuilder();
    final ByteArrayInputStream bis = new ByteArrayInputStream(xmlString.getBytes());
    //where xmlString is a file read using DataInputStream.
    final Document doc1 = db.parse(bis);
    printElements(doc1);

void printElements(final Document doc)
{
    final NodeList nl = doc.getElementsByTagName("*");
    Node node;

    for (int i = 0; i < nl.getLength(); i++)
    {
        node = nl.item(i);
        System.out.println("NodeName:[" + node.getNodeName() + "] Value:[" + node.getTextContent() + "]");           
    }
}

我应该如何从打印中消除员工和公司属性。我不想通过标记名称使用 JAXB 或 getTags，因为 xml 标记数据每次都会更改，并且我正在编写通用 xml 解析器，其工作是解析标记及其值并将其放入映射中。

Alo 我如何才能找到我正在解析的标签的父级，以便我可以跟踪孩子的来源，在这种情况下..company0->staff->firstname。

score 1 · Accepted Answer

可以通过以下更改来做到这一点：

    for (int i=0; i<nodeList.getLength(); i++) 
    {
        // Get element
        Element element = (Element)nodeList.item(i);
        final NodeList nodes = element.getChildNodes();
        if(nodes.getLength() == 1)
        {               
            System.out.println(element.getNodeName() + " " + element.getTextContent());
        }            
    }

score 0 · Accepted Answer

JaxB 将是一个更好的类，但您可以尝试以下简单的方法：

for (int i = 0; i < nl.getLength(); i++)
{
    node = nl.item(i);

    //check to see if node's name is what you don't want it to be
    if(node.getNodeName().equals("Staff") || node.getNodeName().equals("Comapny"))
    {
        //do stuff or dont do anything...
    }
    else//print other stuff
    {
        System.out.println("NodeName:[" + node.getNodeName() + "] Value:[" + node.getTextContent() + "]");
    }           
}

至于您的第二个问题，我建议您查看 Node API：

http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/Node.html

提示：getParentNode()

如果您想要父级的最深入部分（名字、姓氏、薪水），您可以获得第一个节点。然后执行 node.getChildNodes() 以获取子节点列表。彻底搜索每个孩子，直到你打到一个没有孩子的孩子。然后你知道那是一个叶子节点。你想打印那个。

score 0 · Accepted Answer

您可以使用 SAX 解析器来解析 XML 并编写自己的处理程序来扩展 DefaultHandler。

跟踪您在堆栈中读取的标签，并在调用 characters() 时存储您读取的字符。调用 endElement() 时，从栈顶弹出标签名，characters() 读入的最后一个值就是这个标签的值。留在堆栈中的字符串是导致该标签的父标签，例如

对于读取 XML 文件的主要方法：

public static void main(String[] args) {
    File xmlFile = new File("somefile.xml");

    SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

    MyHandler handler = new MyHandler();

    saxParser.parse(xmlFile, handler);

    Map<String, String> map = handler.getDataMap();
}

我们有自己的处理程序。

public class MyHandler extends DefaultHandler {
    private String characters = null;
    private Stack<String> tagStack;

    private Map<String, String> dataMap;

    public MyHandler() {
        this.tagStack = new Stack<String>();
        this.dataMap = new HashMap<String, String>();
    }   

    @Override
    public void startElement(String uri, String localName, String qName, 
             Attributes attributes) throws SAXException {
        this.tagStack.push(qName);
    }

    @Override
    public void characters(char[] ch, int start, int length) 
             throws SAXException {
        // trimming to take out whitespace between tags
        characters = new String(ch).trim();
    }

    @Override
    public void endElement(String uri, String localName,
            String qName) throws SAXException {
        // check that the end element we're looking at matches the last read 
        // startElement this should only happen if we don't have well-formed XML
        if (qName.equals(this.tagStack.peek())) {

            String[] tagArray = this.tagStack.toArray(new String[this.tagStack.size()]);

            // make use of apache-common-lang, or write your own code to concat 
            // the list with '.'s
            String tagHierarchy = StringUtils.join(tagArray, ".");
            this.dataMap.put(tagHierarchy, this.characters);

            // EDIT: I forgot to pop the last item off the stack :)
            this.tagStack.pop();
        } else {
            throw new SAXException("XML is not well-formed");
        }
    }

    public Map<String, String> getDataMap() {
        return this.dataMap;
    }

}

这将返回一个使用 OP 中描述的输入数据的 Map：

["company.staff.firstname", "Kevin"]
["company.staff.lastname", "Gay"]
["company.staff.salary", "50000"]

如果您不希望元素的完整路径作为键，例如 Map 其中 key 是标签名称，value[0] 是父路径，value[1] 是实际值，您可以自己进行调整等等

java - Xml解析获取标签之间的数据以及父信息

3 回答 3

Related

Reference