ruby - 为什么 xpath 不返回此 XML 节点？

Question

所以我有如下代码：

content_url = 'http://auburn.craigslist.org/cpg/index.rss'
doc = Nokogiri::XML(open(content_url))
bq = doc.xpath('//item')

但它返回bq为空。

我确定它有那个标签，这是该页面上的前几个标签：

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:ev="http://purl.org/rss/1.0/modules/event/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:admin="http://webns.net/mvcb/">
<channel rdf:about="http://auburn.craigslist.org/cpg/index.rss">...</channel>
<item rdf:about="http://auburn.craigslist.org/cpg/3012277218.html">...</item>

想法？

score 5 · Accepted Answer

由于item不在默认命名空间中，您需要告诉 XPath 在哪个命名空间下查看。

首先，您的命名空间是xmlns属性设置的内容。对于 Craigslist，它似乎是http://purl.org/rss/1.0/. 这就是您必须告诉 XPath 您要使用的名称空间。

但是，在调用 XPath 时，我们必须指定要使用的额外命名空间是什么。像这样。

doc.xpath('//item', { 'rdf' => 'http://purl.org/rss/1.0/' })

不是这样，我们需要告诉 XPath 项目在rdf命名空间下。我们可以通过在标签名称前加上命名空间来做到这一点。像这样。

doc.xpath('//rdf:item', { 'rdf' => 'http://purl.org/rss/1.0/' })

score 3 · Accepted Answer

它与命名空间有关。你可以这样做：

doc.remove_namespaces!

或者你可以使用

doc.css('item')

反而

ruby - 为什么 xpath 不返回此 XML 节点？

2 回答 2

Related

Reference