ruby - 无法使用 Nokogiri 从 XML 文档中检索 Google 命名空间中的数据

Question

我有这个谷歌购物提要：

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
  <item>
    <title>test</title>
    <g:id>1</g:id>
    <g:color>blue</g:color>
  </item>
  <item>
    <title>test2</title>
    <g:id>2</g:id>
    <g:color>red</g:color>
  </item>
</channel></rss>

我已经找了好几天了，我似乎找不到答案。我还研究了 Nokogiri 文档，但这也没有解决任何问题。

我正在尝试做的事情：

doc = Nokogiri::XML(*Google Shopping Feed*)
doc.css('channel > item').each do |item|
  puts item.css('g:id')
end

但这没有任何回报。我尝试了很多建议，但似乎都没有奏效。显然我在这里错过了一些东西，但我不知道是什么。

我想不通的另一件事是检索项目中所有属性的列表。所以我的问题是如何从 Google 购物提要中检索以下数组：

# attributes => ['title', 'g:id', 'g:color']

score 0 · Accepted Answer

如果您想保留命名空间信息，最简单的解决方案可能是使用 Xpath 表达式。

类似的东西

doc.xpath('//item').each_with_index do |node, i|
  puts "Element #{i} attributes:"
  node.xpath("*/text()").each do |element| 
    puts "#{element.name}: #{element.text}"
  end
end

score 0 · Accepted Answer

尝试at_xpath使用text：

doc.css('channel > item').each do |item|
  puts item.at_xpath('g:id').text
end
#=> 1
#=> 2

我想不通的另一件事是检索项目中所有属性的列表。

你可以像这样得到一个数组item：

doc.css('channel > item').map do |item|
  item.element_children.map do |key|
    prefix = "#{key.namespace.prefix}:" if key.namespace
    name   = key.name

    "#{prefix}#{name}"
  end
end
#=> [["title", "g:id", "g:color"], ["title", "g:id", "g:color"]]

如果所有项目都具有完全相同的属性，那么您可以只使用第一个元素（而不是迭代所有元素）：

doc.css('channel > item').first.element_children.map do |key|
  prefix = "#{key.namespace.prefix}:" if key.namespace
  name   = key.name

  "#{prefix}#{name}"
end
#=> ["title", "g:id", "g:color"]

ruby - 无法使用 Nokogiri 从 XML 文档中检索 Google 命名空间中的数据

2 回答 2

Related

Reference