ruby - Nokogiri::XML.parse 是否应该为换行创建单独的文本节点？

Question

我有一个由外部工具创建的 XML 文档：

<?xml version="1.0" encoding="UTF-8"?>
<suite>
  <id>S1</id>
  <name>First Suite</name>
  <description></description>
  <sections>
    <section>
  <name>section 1</name>                    
      <cases>
        <case>
          <id>C1</id>
          <title>Test 1.1</title>
          <type>Other</type>
          <priority>4 - Must Test</priority>
          <estimate></estimate>
          <milestone></milestone> 
          <references></references> 
        </case>                             
        <case>
          <id>C2</id>
          <title>Test 1.2</title>
          <type>Other</type>
          <priority>4 - Must Test</priority>
          <estimate></estimate>
          <milestone></milestone> 
          <references></references> 
        </case>
      </cases>
    </section>
  </sections>
</suite>

从 irb，我执行以下操作：（输出被抑制直到最终命令）

> require('nokogiri')
> doc = Nokogiri::XML.parse(open('./test.xml'))
> test_case = doc.search('case').first
=> #<Nokogiri::XML::Element:0x3ff75851bc44 name="case" children=[#<Nokogiri::XML::Text:0x3ff75851b8fc "\n          ">, #<Nokogiri::XML::Element:0x3ff75851b7bc name="id" children=[#<Nokogiri::XML::Text:0x3ff75851b474 "C1">]>, #<Nokogiri::XML::Text:0x3ff75851b1cc "\n          ">, #<Nokogiri::XML::Element:0x3ff75851b078 name="title" children=[#<Nokogiri::XML::Text:0x3ff75851ad58 "Test 1.1">]>, #<Nokogiri::XML::Text:0x3ff75851aa9c "\n          ">, #<Nokogiri::XML::Element:0x3ff75851a970 name="type" children=[#<Nokogiri::XML::Text:0x3ff75851a6c8 "Other">]>, #<Nokogiri::XML::Text:0x3ff7585191d8 "\n          ">, #<Nokogiri::XML::Element:0x3ff7585190d4 name="priority" children=[#<Nokogiri::XML::Text:0x3ff758518d64 "4 - Must Test">]>, #<Nokogiri::XML::Text:0x3ff758518ad0 "\n          ">, #<Nokogiri::XML::Element:0x3ff7585189a4 name="estimate">, #<Nokogiri::XML::Text:0x3ff758518670 "\n          ">, #<Nokogiri::XML::Element:0x3ff758518558 name="milestone">, #<Nokogiri::XML::Text:0x3ff7585182b0 "\n          ">, #<Nokogiri::XML::Element:0x3ff758518184 name="references">, #<Nokogiri::XML::Text:0x3ff758517ef0 "\n        ">]>

这会产生一些如下所示的子级：

#<Nokogiri::XML::Text:0x3ff758517ef0 "\n        ">

我想遍历这些 XML 节点，而不必执行以下操作：

> real_nodes = test_case.children.reject{|n| n.node_name == 'text' && n.content.strip!.empty?}

我在 Nokogiri 文档中找不到 parse 参数来禁止将换行符视为单独的节点。有没有办法在解析期间而不是之后做到这一点？

score 6 · Accepted Answer

检查文档。你可以这样做：

doc = Nokogiri::XML.parse(open('./test.xml')) do |config|
    config.noblanks
end

这将加载没有任何空节点的文件。

score 0 · Accepted Answer

文本节点是漂亮打印 XML 的结果。该规范不需要标签之间的空格，并且为了提高效率，一个巨大的 XML 文件可以去除标签间的空格以节省空间并减少传输时间，而不会牺牲数据内容。

这可能显示正在发生的事情：

require 'nokogiri'

xml = '<foo></foo>'
Nokogiri::XML(xml).at('foo').child
=> nil

标签之间没有空格，也没有文本节点。

xml = '<foo>
</foo>'
Nokogiri::XML(xml).at('foo').child
=> #<Nokogiri::XML::Text:0x3fcee9436ff0 "\n">
doc.at('foo').child.class
=> Nokogiri::XML::Text

使用空格进行漂亮打印，XML 在foo标签后面有一个文本节点。

ruby - Nokogiri::XML.parse 是否应该为换行创建单独的文本节点？

2 回答 2

Related

Reference