3

我使用 Ruby 1.9.3p385 并使用 Nokogiri 解析 XML 文件。不太确定我使用哪个 xpath 版本,但它确实响应 v.1 语法/函数,而不是 v.2 语法。

我有这个 XML 文件:

<root_tag>
  <middle_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 1</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 1</tag_2>
      </headline_2>
    </item_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 2</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 2</tag_2>
      </headline_2>
    </item_tag>
  </middle_tag>
</root_tag>

我想提取所有产品,为此我使用以下代码:

products = xml_file.xpath("/root_tag/middle_tag/item_tag/headline_1|/root_tag/middle_tag/item_tag/headline_2")

puts products.size # => 4

如果您查看输出,请使用:

products.each_with_index do |product, i|
  puts "product #{i}:"
  puts product
end

你得到这个:

product 0:
<headline_1>
  <tag_1>Product title 1</tag_1>
</headline_1>
product 1:
<headline_2>
  <tag_2>Product attribute 1</tag_2>
</headline_2>
product 2:
<headline_1>
  <tag_1>Product title 2</tag_1>
</headline_1>
product 3:
<headline_2>
  <tag_2>Product attribute 2</tag_2>
</headline_2>

我需要我的代码将所有匹配项加入/合并到相同的结果中(因此 products.size 应该是 2)。最终输出应如下所示:

product 0:
<headline_1>
  <tag_1>Product title 1</tag_1>
</headline_1>
<headline_2>
  <tag_2>Product attribute 1</tag_2>
</headline_2>
product 1:
<headline_1>
  <tag_1>Product title 2</tag_1>
</headline_1>
<headline_2>
  <tag_2>Product attribute 2</tag_2>
</headline_2>

我查看了整个互联网,但所有变体,例如:

products = xml_file.xpath("/root_tag/middle_tag/item_tag/*[self::headline_1|self::headline_2]")

一切似乎都输出相同的结果。

我错过了 xpath 中的一些重要点,还是我忽略了一些东西?

4

1 回答 1

3

XPath only knows plain sequences, so there's nothing like subsequences. You will have to wrap each "product" into some XML element. Gladly we've already got such an element (<item_tag/>), so the code is rather simple:

products = doc.xpath("(//item_tag")
products.each_with_index do |product, i|
  puts "product #{i}:"
  product.children.each do |line|
    puts line
  end
end

Output is (probably needs some more formatting, but I'm not used to ruby and can't help you with that):

product 0:

<headline_1>
        <tag_1>Product title 1</tag_1>
      </headline_1>

<headline_2>
        <tag_2>Product attribute 1</tag_2>
      </headline_2>

product 1:

<headline_1>
        <tag_1>Product title 2</tag_1>
      </headline_1>

<headline_2>
        <tag_2>Product attribute 2</tag_2>
      </headline_2>

To address all <headline_n/>-tags, you can also use //*[starts-with(local-name(), 'headline')] to make the code more flexible.

于 2013-03-30T13:06:04.950 回答