java - DOM Parser 在纯 HTML RSS 帖子上接收 NullPointerException

Question

尽管我不确定我是否会成功，但我将尝试尽可能清楚地说明这一点。

我已经在 Android 中实现了一个 DOM 解析器，以根据此处找到的一些代码解析典型的 RSS 提要。它适用于我尝试过的几乎所有提要，但是我只是在theString = nchild.item(j).getFirstChild().getNodeValue();Blogger 站点的某个提要的某个帖子上的某个帖子上遇到了 NullPointerException（我的代码较低）。我知道这只是这篇文章，因为我重写了循环以忽略这篇文章，并且没有出现错误并且解析继续正常。在实际的 RSS 提要中查看这篇文章时，似乎这篇文章完全是用 HTML 编写的（而不仅仅是标准文本），而其他成功的文章则不是。

这会是问题的原因，还是我应该继续寻找？如果这确实是问题，我将如何解决它？有没有办法忽略以这种方式编写的帖子？我尝试寻找替代示例进行比较和尝试，但似乎每个人都在他们的教程中使用了相同的基本代码。

我所指的帖子只是一个链接，以及<div>带有一些不同字体的标签内的几行彩色文本。我会在此处发布，但我不确定提要的所有者是否希望我发布（如果可以，我会询问并更新）。

我的解析器：

try {
        // Create required instances
        DocumentBuilderFactory dbf;
        dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();

        // Parse the xml
        Document doc = db.parse(new InputSource(url.openStream()));
        doc.getDocumentElement().normalize();

        // Get all <item> tags.
        NodeList nl = doc.getElementsByTagName("item");
        int length = nl.getLength();

        for (int i = 0; i < length; i++) {
            Node currentNode = nl.item(i);
            RSSItem _item = new RSSItem();

            NodeList nchild = currentNode.getChildNodes();
            int clength = nchild.getLength();

            for (int j = 1; j < clength; j = j + 2) {

                Node thisNode = nchild.item(j);
                String theString = null;
                String nodeName = thisNode.getNodeName();

                theString = nchild.item(j).getFirstChild().getNodeValue();
                if (theString != null) {
                    if ("title".equals(nodeName)) {
                        _item.setTitle(theString);
                    } else if ("description".equals(nodeName)) {
                        _item.setDescription(theString);
                    } else if ("pubDate".equals(nodeName)) {
                        String formatedDate = theString.replace(" +0000", "");
                        _item.setDate(formatedDate);
                    } else if ("author".equals(nodeName)) {
                        _item.setAuthor(theString);
                    }
                }
            }
            _feed.addItem(_item);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return _feed;
}

正如我所提到的，我更改了文本以忽略导致问题的（第三个）帖子：

if(i != 3){
    if (theString != null) {
        if ("title".equals(nodeName)) {
            _item.setTitle(theString);
        } else if ("description".equals(nodeName)) {
            _item.setDescription(theString);
        } else if ("pubDate".equals(nodeName)) {
            String formatedDate = theString.replace(" +0000", "");
            _item.setDate(formatedDate);
        } else if ("author".equals(nodeName)) {
            _item.setAuthor(theString);
        }
    }
}

这导致一切都按预期工作，只是跳过了第三篇文章。对此的任何帮助表示赞赏，我一直在寻找一段时间没有运气。我会发布我的 logcat，但在我在此 Q 开头粘贴的行之后它不是很有用，因为它通过 AsyncTask 返回。

哦，我正在考虑解决它的一种方法是首先解析描述而不是标题（当然是重写循环），并NULL在继续解析之前检测它是否等于。不过会很乱，所以我正在寻找替代方案。

score 1 · Accepted Answer

查看您尝试解析的 HTML 代码。我几乎可以肯定第三个帖子没有孩子。这是，它是空的。例如，这个节点会抛出一个异常：

<Element></Element>

因此，您必须避免getNodeValue在检查节点是否有任何子节点之前调用：

theString = nchild.item(j).getFirstChild().getNodeValue();

为避免这种情况，您可以进行以下操作：

  if (nchild.item(j).getFirstChild() != null)
        //and your code
        //...

java - DOM Parser 在纯 HTML RSS 帖子上接收 NullPointerException

1 回答 1

Related

Reference