ruby - 使用 Nokogiri 解析 XML 文件以确定路径（Ruby）

Question

我的代码应该“猜测”位于我的 XML 文件中相关文本节点之前的路径。在这种情况下，相关意味着：嵌套在重复出现的产品/人/某物标签内的文本节点，但不是在它之外使用的文本节点。

这段代码：

    @doc, items = Nokogiri.XML(@file), []

    path = []
    @doc.traverse do |node|
      if node.class.to_s == "Nokogiri::XML::Element"
        is_path_element = false
        node.children.each do |child|
          is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
        end
        path.push(node.name) if is_path_element == true && !path.include?(node.name)
      end
    end
    final_path = "/"+path.reverse.join("/")

适用于简单的 XML 文件，例如：

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
  </channel>
</rss>

puts final_path # => "/rss/channel/item"

但是当它变得更加复杂时，我应该如何应对挑战？例如这个：

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
  </channel>
</rss>

score 3 · Accepted Answer

如果您正在寻找 XML 中最深的“父”路径的列表，那么查看它的方法不止一种。

尽管我认为可以调整您自己的代码以实现相同的输出，但我确信使用 xpath 可以实现相同的目标。我的动机是让我的 XML 技能不生锈（还没有使用 Nokogiri，但我很快就需要专业地这样做）。因此，这里是如何使用 xpath 获取在其下方只有一个子级别的所有父路径：

xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }

第二个示例文件的输出是：

/rss/channel/item[1]/titles
/rss/channel/item[1]/brands
/rss/channel/item[2]/titles
/rss/channel/item[2]/brands

. . . 如果你取出这个列表并 gsub 出索引，然后使数组唯一，那么这看起来很像你的循环的输出。. .

paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

或者在一行中：

paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

ruby - 使用 Nokogiri 解析 XML 文件以确定路径（Ruby）

1 回答 1

Related

Reference