xpath - XPath中基于Uncle的过滤

Question

假设我有一个包含以下行的 HTML 表，

...
<tr>
  <th title="Library of Quintessential Memes">LQM:</th>
  <td>
    <a href="docs/lqm.html"><b>Intro</b></a>
    <a href="P/P79/">79</a>
    <a href="P/P80/">80</a>
    <a href="P/P81/">81</a>
    <a href="P/P82/">82</a>
  </td>
</tr>
<tr>
  <th title="Library of Boring Books">LBB:</th>
  <td>
    <a href="docs/lbb.html"><b>Intro</b></a>
    <a href="R/R80/">80</a>
    <a href="R/R81/">81</a>
    <a href="R/R82/">82</a>
    <a href="R/R83/">83</a>
    <a href="R/R84/">84</a>
  </td>
</tr>
...

我想选择一个元素<a>中的所有元素，<td>其关联<th>的文本位于一小组固定标题（例如 LQM、LBR 和 RTT）中。如何将其表述为 XPath 查询？

编辑：我正在使用 Scrapy，一个 Python 抓取工具包，所以如果更容易将此查询表述为一组较小的查询，我会非常乐意使用它。例如，如果我可以选择<tr>第一<th>个子元素与正则表达式匹配的所有元素，然后选择<a>剩余元素的所有后代<tr>，那就太好了。

score 3 · Accepted Answer

以下 XPath 将起作用：

//a[contains(',LQM:,LBR:,RTT:,',
             concat(',', ancestor::td/preceding-sibling::th, ','))]

从理论上讲，这可能会产生一些误报（如果您的代码包含逗号）。

更严格的说法是：

//a[ancestor::td/preceding-sibling::th[.='LQM:']]
|//a[ancestor::td/preceding-sibling::th[.='LBR:']]
|//a[ancestor::td/preceding-sibling::th[.='RTT:']]

我通过<table>在您的输入周围添加标签并应用以下 XSL 转换来测试这一点：

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
        <xsl:for-each select="//a[ancestor::td/preceding-sibling::th[.='LQM:']]
                                  |//a[ancestor::td/preceding-sibling::th[.='LBR:']]
                                  |//a[ancestor::td/preceding-sibling::th[.='RTT:']]">
            <xsl:text>
</xsl:text>
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>

</xsl:transform>

它产生以下输出：

<a href="docs/lqm.html"><b>Intro</b></a>
<a href="P/P79/">79</a>
<a href="P/P80/">80</a>
<a href="P/P81/">81</a>
<a href="P/P82/">82</a>

当然，如果您使用的是 XSL，那么您可能会发现这种结构更具可读性：

<xsl:for-each select="//a">
    <xsl:variable name="header" select="ancestor::td/preceding-sibling::th"/>

    <xsl:if test="$header='LQM:' or $header = 'LBR:' or $header = 'RTT:'">
        <xsl:text>
        </xsl:text>
        <xsl:copy-of select="."/>

    </xsl:if>
</xsl:for-each>

xpath - XPath中基于Uncle的过滤

1 回答 1

Related

Reference