ruby - 除了实际元素之外，Nokogiri XML.children 返回格式化元素。如何避免这种情况？

Question

我有以下 XML：

<attributes>
    <intelligence>27</intelligence>
    <memory>21</memory>
    <charisma>17</charisma>
    <perception>17</perception>
    <willpower>17</willpower>
</attributes>

我想解析以下内容：

intelligence: 27, memory: 21, charisma: 17, perception: 17, willpower: 17

当我尝试这段代码时：

def get_attributes(api)
  attributes = []
  api.xpath("//attributes").children.each do |attribute|
    name = attribute.name.tr('^A-Za-z0-9', '')
    text = attribute.text
    attributes << "#{name}: #{text}"
  end
  attributes
end

我得到每个偶数孩子的换行数据（因为格式化）的结果：

#(Text "\n      ")
#(Element:0x3ffe166fdb9c { name = "intelligence", children = [ #(Text "20")] })
#(Text "\n      ")
#(Element:0x3ffe166f71ac { name = "memory", children = [ #(Text "25")] })
#(Text "\n      ")
#(Element:0x3ffe166f3818 { name = "charisma", children = [ #(Text "23")] })
#(Text "\n      ")
#(Element:0x3ffe166f0604 { name = "perception", children = [ #(Text "16")] })
#(Text "\n      ")
#(Element:0x3ffe166b52e8 { name = "willpower", children = [ #(Text "15")] })
#(Text "\n    ")

Nokogiri 中是否有一种方法可以跳过这些“仅格式化”的孩子？还是我必须手动遍历奇数元素？

我希望api.xpath("//attributes").children导航实际的孩子，而不是格式化文本。

score 6 · Accepted Answer

该children方法将返回目标节点的所有子节点，包括文本节点。如果您只想要所有元素节点子节点，您可以在 XPath 查询中使用以下命令指定它*：

def attributes(api)
  api.xpath('//attributes/*').each_with_object([]) do |n, ary|
    ary << "#{n.name}: #{n.text}"
  end
end

这将返回一个格式为的字符串数组name: value，这就是您想要的样子。

score 1 · Accepted Answer

我认为简短的回答是“不”。但是，您可以轻松地做到：

if attribute.element?
    name = attribute.name.tr('^A-Za-z0-9', '')
    text = attribute.text
    attributes << "#{name}: #{text}"
end

以获得预期的效果。或者，此版本可能更具可读性：

if ! attribute.text?
   name = ...
   ...
end

score 1 · Accepted Answer

如果您只想要孩子的文本节点，请使用：

require 'nokogiri'
require 'pp'

doc = Nokogiri::HTML(<<EOT)
<attributes>
    <intelligence>27</intelligence>
    <memory>21</memory>
    <charisma>17</charisma>
    <perception>17</perception>
    <willpower>17</willpower>
</attributes>
EOT

doc.at('attributes').children.map(&:text)

返回：

["27", "21", "17", "17", "17"]

从那里你可以轻松地做到：

'intelligence: %02d, memory: %02d, charisma: %02d, perception: %02d, willpower: %02d' % doc.at('attributes').children.map(&:text)
=> "intelligence: 27, memory: 21, charisma: 17, perception: 17, willpower: 17"

如果你想让它更有条理，你可以这样做：

doc.at('attributes').children.each_with_object({}){ |o,h| h[o.name] = o.text }
=> {"intelligence"=>"27", "memory"=>"21", "charisma"=>"17", "perception"=>"17", "willpower"=>"17"}

或者：

doc.at('attributes').children.each_with_object({}){ |o,h| h[o.name.to_sym] = o.text }
=> {:intelligence=>"27", :memory=>"21", :charisma=>"17", :perception=>"17", :willpower=>"17"}

doc.at('attributes').children
=> [#<Nokogiri::XML::Element:0x3fc3245fb8fc name="intelligence" children=[#<Nokogiri::XML::Text:0x3fc3245fb6f4 "27">]>, #<Nokogiri::XML::Element:0x3fc3245fb4ec name="memory" children=[#<Nokogiri::XML::Text:0x3fc3245fb2e4 "21">]>, #<Nokogiri::XML::Element:0x3fc3245fb0dc name="charisma" children=[#<Nokogiri::XML::Text:0x3fc3245faed4 "17">]>, #<Nokogiri::XML::Element:0x3fc3245fecb4 name="perception" children=[#<Nokogiri::XML::Text:0x3fc3245feaac "17">]>, #<Nokogiri::XML::Element:0x3fc3245fe8a4 name="willpower" children=[#<Nokogiri::XML::Text:0x3fc3245fe69c "17">]>]

ruby - 除了实际元素之外，Nokogiri XML.children 返回格式化元素。如何避免这种情况？

3 回答 3

Related

Reference