ruby - 选择相邻的兄弟元素而不干预非空白文本节点

Question

给定标记，如：

<p>
  <code>foo</code><code>bar</code>
  <code>jim</code> and then <code>jam</code>
</p>

我需要选择前三个<code>——但不是最后一个。逻辑是“选择所有code具有前置或后置同级元素的元素，该元素也是code，除非存在一个或多个文本节点，它们之间具有非空白内容。

鉴于我使用的是 Nokogiri（它使用 libxml2），我只能使用 XPath 1.0 表达式。

尽管需要一个复杂的 XPath 表达式，但在 Nokogiri 文档上执行相同操作的 Ruby 代码/迭代也是可以接受的。

请注意，CSS相邻兄弟选择器会忽略非元素节点，因此选择nokodoc.css('code + code')会错误地选择最后一个<code>块。

Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]

编辑：更多测试用例，为清楚起见：

<section><ul>
  <li>Go to <code>N</code> and
      then <code>Y</code><code>Y</code><code>Y</code>.
  </li>
  <li>If you see <code>N</code> or <code>N</code> then…&lt;/li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

以上Y都应该选择。不N应该选择任何一个。的内容<code>仅用于指示应该选择哪个：您可能不会使用内容来确定是否选择一个元素。

出现的上下文元素<code>是不相关的。它们可能出现在中<li>，它们可能出现在中<p>，它们可能出现在其他东西中。

我想<code>一次选择所有连续的运行。在一组 . 的中间有一个空格字符并不是一个错误Y。

score 4 · Accepted Answer

使用：

//code
     [preceding-sibling::node()[1][self::code]
    or
      preceding-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      preceding-sibling::node()[2][self::code]
    or
     following-sibling::node()[1][self::code]
    or
      following-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      following-sibling::node()[2][self::code]
     ]

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:template match="/">
      <xsl:copy-of select=
       "//code
             [preceding-sibling::node()[1][self::code]
            or
              preceding-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              preceding-sibling::node()[2][self::code]
            or
             following-sibling::node()[1][self::code]
            or
              following-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              following-sibling::node()[2][self::code]
             ]"/>
     </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时：

<section><ul>
      <li>Go to <code>N</code> and
          then <code>Y</code><code>Y</code><code>Y</code>.
      </li>
      <li>If you see <code>N</code> or <code>N</code> then…&lt;/li>
    </ul>
    <p>Elsewhere there might be: <code>N</code></p>
    <p><code>N</code> across parents.</p>
    <p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
    <p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

评估包含的 XPath 表达式，并将选定的节点复制到输出：

<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>

score 3 · Accepted Answer

//code[
  (
    following-sibling::node()[1][self::code]
    or (
      following-sibling::node()[1][self::text() and normalize-space() = ""]
      and
      following-sibling::node()[2][self::code]
    )
  )
  or (
    preceding-sibling::node()[1][self::code]
    or (
      preceding-sibling::node()[1][self::text() and normalize-space() = ""]
      and
      preceding-sibling::node()[2][self::code]
    )
  )
]

我认为这可以满足您的需求，尽管我不会声称您实际上想要使用它。

我假设文本节点总是合并在一起，这样就不会有两个相邻的节点，我相信通常是这种情况，但如果您事先进行 DOM 操作，则可能不会。我还假设元素之间不会有任何其他元素code，或者如果有它们会阻止选择，例如非空白文本。

score 1 · Accepted Answer

我认为这就是你想要的：

/p/code[not(preceding-sibling::text()[not(normalize-space(.)="")])]

ruby - 选择相邻的兄弟元素而不干预非空白文本节点

3 回答 3

Related

Reference