0

这段 Java 代码打印了来自 NYT 世界 RSS 的每个项目的标题、链接和发布日期。但对于 NYT 的 Science RSS,它不会打印链接字段。这里发生了什么?

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse( direccion );
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/rss/channel/item");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nl.getLength(); i++) {
    Node node = nl.item(i);

    Node nodoTitulo = (Node) xpath.evaluate("title", node, XPathConstants.NODE);
    System.out.println(nodoTitulo.getTextContent());

    Node nodoLink = (Node) xpath.evaluate("link", node, XPathConstants.NODE);
    System.out.println(nodoLink.getTextContent());

    Node nodoFecha = (Node) xpath.evaluate("pubDate", node, XPathConstants.NODE);
    System.out.println(nodoFecha.getTextContent());
    System.out.println();
}
4

1 回答 1

0

这是个namespace问题。

在科学 RSS 中,您有

<atom:link href="http://www.nytimes.com/2012/08/19/business/new-wave-of-adept-robots-is-changing-global-industry.html?partner=rss&amp;emc=rss" rel="standout"/>
<title>The iEconomy: New Wave of Deft Robots Is Changing Global Industry</title>
<link>http://feeds.nytimes.com/click.phdo?i=5861b5e3f6b66da6ca12beab1e5d8729</link>

在世界 RSS 中,你有

<title>Syrian Rebels Claim to Have Brought Down a Jet</title>
<link>http://feeds.nytimes.com/click.phdo?i=314bd32f9d6141a500e76e3076c489c9</link>
.
.
.
<atom:link rel="standout" href="http://www.nytimes.com/2012/08/14/world/middleeast/syrian-rebels-claim-to-have-brought-down-a-jet.html?partner=rss&amp;emc=rss"/>

您的代码首先选择<atmoic:link>节点。

添加:

factory.setNamespaceAware(true);

创建工厂之后和创建构建器之前,您现在应该获得链接

title = The iEconomy: New Wave of Deft Robots Is Changing Global Industry
link = http://feeds.nytimes.com/click.phdo?i=5861b5e3f6b66da6ca12beab1e5d8729
pubDate = Sun, 19 Aug 2012 21:26:33 GMT

如果你真的有兴趣,你可以阅读这个以获得更多信息

于 2012-08-20T04:46:05.810 回答