ruby - 在 Ruby 中使用智能标记分组进行 XML 解析

Question

这是我要实现的转换的示例。源 XML：

<cats>
  <cat>John</cat>
  <cat>Peter</cat>
</cats>

结果：

{'cats' => ['John', 'Peter']}

而且我希望'cats'结果哈希的值是一个数组，即使<cat>源 XML 中只有一个。

所以，我希望解析器应用规则：

如果 nodexyzs包含一个或多个具有名称的子节点xyz（并且没有其他节点），则 nodexyzs应表示为结果哈希中的一个数组，带有名称xyzs（并且数组的每个元素都应该是相应xyz元素的内容）。

以下是使用XmlSimple lib实现它的方法：

XmlSimple.xml_in('cats.xml',{:forcearray=>['cat'], :grouptags=>{"cats"=>"cat"}})

但是，我必须输入目标元素的所有名称，而且似乎没有其他方法可以在 XmlSimple 中定义 forcearray/grouptags 行为。

破解一个提取所有名称然后将它们传递给 xml_in 方法的预处理例程并不难，但是可能有更优雅（即，已经编写好的）方法来做到这一点？

（如果它能够进行转换，我很乐意使用任何其他 XML 解析库）

UPD：如果有关系，我的最终目标是将生成的哈希保存到 MongoDB 中（即整体转换是 XML -> BSON）

UPD2：同样，我不想指定应该被视为数组的元素的名称，我希望 lib 为我做魔法。

score 1 · Accepted Answer

首先找到以结尾的元素名称s：

names = doc.search('*[name()$="s"]').map(&:name).uniq
#=> ["cats"]

剩下的只是映射和散列：

Hash[names.map{|name| [name, doc.search("#{name} > #{name.sub /s$/, ''}").map(&:text)]}]
#=> {"cats"=>["John", "Peter"]}

score 1 · Accepted Answer

使用 Nokogiri，我们可以编写以下代码：

require 'inflector'
require 'nokogiri'

def get_xml_stuff(xml, singular)
  plural = Inflector.pluralize(singular)
  return_hash = {plural => []}
  xml.xpath("*/#{plural}/#{singular}").each { |tag| return_hash[plural] << tag.text}
  return return_hash
end

根据我的测试，这解决了与您的 XmlSimple 代码匹配的简单案例。对于您的进一步要求：

如果 nodexyzs包含一个或多个具有名称的子节点xyz（并且没有其他节点），则 nodexyzs应表示为结果哈希中的一个数组，带有名称xyzs（并且数组的每个元素都应该是相应xyz元素的内容）。

def get_xml_stuff(xml, singular)
  plural = Inflector.pluralize(singular)
  return_hash = {plural => []}
  path = xml.xpath("*/#{plural}/#{singular}")
  path.each { |tag| return_hash[plural] << tag.text} unless path.size != xml.xpath("*/#{plural}/*").children.size
  return return_hash
end

但是，如果同一个复数在文件中出现多次，这仍然不完美。

回答UPD2。我的新版本功能如下：

def get_xml_stuff(xml, plural)
  singular = Inflector.singularize(plural)
  return_hash = {plural => []}
  path = xml.xpath("./#{singular}")
  path.each { |tag| return_hash[plural] << tag.text} unless path.size != xml.xpath("./*").size
  return return_hash
end

在这里，我们从复数父节点开始，如果所有命名的子节点都具有该单数名称，则收集所有单数子节点。我的新测试代码变为：

sample_xml = Nokogiri::XML(sample_xml_text)
sample_xml.children.xpath("*").each do |child|
  array = get_xml_stuff(child, child.name)
  p array
end

如果没有像我的示例这样的标签<pets>，则以下内容应该有效：

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml.children.first, sample_xml.children.first.name)
p array

结束UPD2

作为参考，我的测试是：

sample_xml_text = <<-sample
<pets>
  <cats>
    <cat>John</cat>
    <cat>Peter</cat>
  </cats>
  <kitties>
    <kitty>Tibbles</kitty>
    <kitty>Meow-chan</kitty>
    <kitty>Puss</kitty>
  </kitties>
  <giraffes>
    <giraffe>Long Neck</giraffe>
  </giraffes>
  <dogs>
    <dog>Rover</dog>
    <dog>Spot</dog>
    <cat>Peter</cat>
  </dogs>
</pets>
sample

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml, "cat")
p array
array = get_xml_stuff(sample_xml, "kitty")
p array
array = get_xml_stuff(sample_xml, "giraffe")
p array
array = get_xml_stuff(sample_xml, "dog")
p array

ruby - 在 Ruby 中使用智能标记分组进行 XML 解析

2 回答 2

Related

Reference