xml - 查找表头是否包含匹配字符串的 GPath

Question

我正在使用 NekoHTML 解析器将 HTML 文件解析为格式良好的 XML 文档。但是我不能完全弄清楚 GPath，以便我可以识别具有“设置”字符串的表。

def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)

    def html = 
    ''' 
        <html>
            <title>Hiya!</title>
        </html>
        <body>
            <table>
                <tr>
                    <th colspan='3'>Settings</th>
                    <td>First cell r1</td>
                    <td>Second cell r1</td>
                </tr>
            </table>
            <table>
                <tr>
                    <th colspan='3'>Other Settings</th>
                    <td>First cell r2</td>
                    <td>Second cell r2</td>
                </tr>
            </table>
    '''

    def slurper = new XmlSlurper(parser)
    def page = slurper.parseText(html)

在此示例中，应选择第一个表，以便我可以迭代其中的其他行值。有人可以帮我解决这个 GPath 吗？

编辑：附带问题 - 为什么

println page.HTML.HEAD.TITLE

打印一个空字符串，它不应该返回标题吗？

score 1 · Accepted Answer

要获取标题中带有“设置”的表格，您应该能够：

def settingsTableNode = page.BODY.TABLE.find { table ->
  table.TBODY.TR.TH.text() == 'Settings'
}

page指向文档的根目录，因此您不需要HTML. 您需要做的就是：
```
println page.HEAD.TITLE
```

xml - 查找表头是否包含匹配字符串的 GPath

1 回答 1

Related

Reference