java - getNodeValue() 截断 org.w3c.dom.Node 中的属性内容

Question

我正在使用 Android，需要从 URL 获取 XML 并检索一些值。下载没问题，但有些字段可以包含 HTML 实体（如 –）。当我从 Node 类 (org.w3c.dom.Node) 调用方法 getNodeValue() 时，该值在找到 & 字符时停止，并截断字符串。

例如：

<title>Episode #56 &#8211; Heroes</title>

当我调用 getNodeValue() 时，只返回“第 56 集”。

score 0 · Accepted Answer

你可以尝试这样的事情

String str = "<title>Episode #56 &#8211; Heroes</title>";
str = str.replaceAll("&", "amp;");

然后尝试解析'str'它应该可以工作。

这是带有 dom 解析器的纯示例实现。

public static void main(String[] args) throws XPathExpressionException {
    String str = "<title>Episode #56 &#8211; Heroes</title>";   
    str = str.replaceAll("&", "amp;");
    Document domDoc = null;
    try {
        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
        ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
        domDoc = docBuilder.parse(bis);
    } catch (Exception e) {
        e.printStackTrace();
    }
    NodeList nlist = domDoc.getElementsByTagName("title");
    //System.out.println("child count  "+nlist.getLength());
    System.out.println("title value = "+nlist.item(0).getTextContent());
}

java - getNodeValue() 截断 org.w3c.dom.Node 中的属性内容

1 回答 1

Related

Reference