xml - 使用 libxml-ruby 解析命名空间的 XML

Question

我正在尝试使用 libxml-ruby 以以下格式（来自欧洲中央银行数据源）解析 XML：

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" 
                 xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
  <gesmes:subject>Reference rates</gesmes:subject>
  <gesmes:Sender>
    <gesmes:name>European Central Bank</gesmes:name>
  </gesmes:Sender>
  <Cube>
    <Cube time="2009-11-03">
      <Cube currency="USD" rate="1.4658"/>
      <Cube currency="JPY" rate="132.25"/>
      <Cube currency="BGN" rate="1.9558"/>
    </Cube>
  </Cube>
</gesmes:Envelope>

我正在按如下方式加载文档：

require 'rubygems'
require 'xml/libxml'
doc = XML::Document.file('eurofxref-hist.xml')

但我正在努力提出正确的命名空间配置以允许对数据进行 XPATH 查询。

我可以Cube使用以下代码提取所有节点：

doc.find("//*[local-name()='Cube']")

但是考虑到父节点和子节点都被称为Cube这真的无助于我只迭代父节点。也许我可以修改这个 XPATH 以只找到那些带有time参数的节点？

我的目标是能够提取所有Cube具有time属性（即）的节点，这样我就可以提取日期并迭代子节点<Cube time="2009-11-03">中的汇率。Cube

任何人都可以帮忙吗？

score 3 · Accepted Answer

这些中的任何一个都可以：

/gesmes:Envelope/Cube/Cube - direct path from root
//Cube[@time] - all cube nodes (at any level) with a time attribute

好的，这是经过测试和工作的

arrNS = ["xmlns:http://www.ecb.int/vocabulary/2002-08-01/eurofxref", "gesmes:http://www.gesmes.org/xml/2002-08-01"]
doc.find("//xmlns:Cube[@time]", arrNS)

score 0 · Accepted Answer

所以我想通了。根节点定义了两个命名空间，一个带前缀，一个不带：

xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01
xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref"

定义前缀后，您可以很容易地引用前缀命名空间名称。使用原始问题中的 XML，这个 XPATH：

/gesmes:Envelope/gesmes:subject

将返回“参考利率”。

因为Cube节点没有前缀，所以我们首先需要为全局命名空间定义一个命名空间前缀。这就是我实现这一目标的方式：

doc = XML::Document.file('eurofxref-hist-test.xml')
context = XML::XPath::Context.new(doc)
context.register_namespace('euro', 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref')

一旦定义了这一点，找到具有时间属性的 Cube 节点就很简单了：

context.find("//euro:Cube[@time]").each {|node| .... }

xml - 使用 libxml-ruby 解析命名空间的 XML

2 回答 2

Related

Reference