1

I am parsing definitions from a dictionary api. I have this line of xml

<dt>:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds with rudimentary wings, stout legs, a long bill, and grayish brown hairlike plumage</dt>

How would i get the full line of the dt element. My problem is that it doesn't work when it gets up to this part (Apteryx) because there are additional tags in the element. How would i get the whole dt element as one whole string. Here is my current code.

Element def = (Element) element.getElementsByTagName("def").item(0);
System.out.println(getValue("dt",def).replaceAll("[^\\p{L}\\p{N} ]", ""));

Where def is the element that holds the dt element.

And here is my getValue code

private static String getValue(String tag, Element element)
{
    NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
    Node node = (Node) nodes.item(0);
    return node.getNodeValue();
}

Sometimes there are multiple nested tags within the dt element

4

1 回答 1

0

混合https://stackoverflow.com/a/5948326/145757Get a node's internal XML as String in Java DOM我们得到:

public static String getInnerXml(Node node)
{
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false);
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++)
    {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}

添加我的评论,这给出了:

getInnerXml(document.getElementsByTagName("dt").item(0));

结果:

:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds...

希望这可以帮助...

于 2013-06-11T15:55:00.103 回答