xml - 使用 Nokogiri 解析 blogspot XML 文件

Question

我有一个 blogspot 导出的 xml 文件，它看起来像这样：

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed>

如何用 Nokogiri 和 Xpath 解析？？？

这是我所拥有的：

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

但它没有给我任何东西：/

有什么建议么？：/

score 0 · Accepted Answer

我只是偶然发现了这个问题。问题似乎是 XML 命名空间：

“原来我必须删除提要的属性”

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML 命名空间使访问节点变得复杂，因为它们提供了一种分离相似标签的方法。阅读Searching an HTML / XML Document.

Nokogiri 也有remove_namespaces!一种有时有用的方法来处理问题，但也有一些缺点。

score 0 · Accepted Answer

你的代码对我有用。某些版本的 Nokigiri 存在一些问题。

我得到：

 Content
 Content

我正在使用 nokogiri (1.4.1 x86-mswin32)

score 0 · Accepted Answer

原来我不得不删除提要的属性

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

xml - 使用 Nokogiri 解析 blogspot XML 文件

3 回答 3

Related

Reference