html - XPath 可以在无效标记中任意出现文本后定位节点吗？

Question

我有一个顽皮的 Web 开发人员编写的文档，它看起来像：

<div id="details">
    Here is some text without a p tag. Oh, let's write some more.
    <br>
    <br>
    And some more.
    <table id="non-unique">
        ...
    </table>
    Replaces the following numbers:
    <table id="non-unique">
        ... good stuff in here
    </table>
</div>

因此，它没有很好地标记。我需要抓住包含好东西的表格，但是，它没有唯一的 id 值，并且它并不总是以相同的顺序排列，或者在 div 中的最后一个等。

唯一运行的主题是它始终跟随 text Replaces the following numbers:，尽管此文本可能与上面的示例中一样，或者有时在h4元素中！

是否可以使用 XPath 表达式通过搜索替换字符串然后询问下一个表元素来解决此表？

谢谢！

score 1 · Accepted Answer

这对我来说似乎有效：

//text()[contains(.,"Replaces the following numbers")]/following-sibling::table[1]

没有规定 id 必须是唯一的。

score 1 · Accepted Answer

使用：

//node()[self::h4 or self::text()]
         [normalize-space() = 'Replaces the following numbers:']
           /following-sibling::*[1][self::table]

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "//node()[self::h4 or self::text()]
             [normalize-space() = 'Replaces the following numbers:']
               /following-sibling::*[1][self::table]
   "/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的文档时（更正为格式良好的 XML 文档）：

<div id="details">
 Here is some text without a p tag. Oh, let's write some more.
    <br />
    <br />
    And some more.     
    <table id="non-unique">
     ...
  </table>
  Replaces the following numbers:
    <table id="non-unique">
    ... good stuff in here
    </table>
</div>

计算 XPath 表达式并将选定的节点复制到输出：

<table id="non-unique">
    ... good stuff in here
    </table>

当在此 XML 文档上应用相同的转换（XPath 表达式）时：

<div id="details">
 Here is some text without a p tag. Oh, let's write some more.
    <br />
    <br />
    And some more.     
    <table id="non-unique">
     ...
  </table>
  <h4>Replaces the following numbers:</h4>
    <table id="non-unique">
    ... good stuff in here
    </table>
</div>

再次选择想要的元素并输出：

<table id="non-unique">
    ... good stuff in here
    </table>

score -1 · Accepted Answer

不，因为 XPath 需要格式良好的 Xml 才能运行。

参看。这个答案，它提供了一些额外的信息。

html - XPath 可以在无效标记中任意出现文本后定位节点吗？

3 回答 3

Related

Reference