xslt-1.0 - 选择包含混合内容或仅使用 XPath 的文本的节点

Question

使用 XPath 1.0 和 XSLT 1.0 我需要选择混合内容的直接父级或仅文本。考虑以下示例：

<table class="dont-match">
    <tr class="dont-match">
        <td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
        <td class="match">Plain text in here.</td>
        <td class="dont-match"><img src="..." /></td>
    </tr>
</table>
<div class="dont-match">
    <div class="dont-match"><img src="..." /></div>
    <div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
    <p class="match">Plain text in here.</p>
</div>

显然 classes match，maybe-matchanddont-match仅用于演示目的，不可用于匹配。maybe-match意味着最好不匹配，但我可以自己解决问题，以防难以排除这些。

提前谢谢了！

score 2 · Accepted Answer

要获得匹配项和可能匹配项，您可以使用

 //*[count(text())>=1]

如果您的 xml 解析器只忽略空白文本节点，或者其他

//*[normalize-space(string(./text())) != ""]

并且可以通过检查某些锚点是否匹配来过滤掉可能的匹配项，但随后它变得丑陋（空白仅作为文本节点的情况）：

//*[(normalize-space(string(./text())) != "") and count(./ancestor::*[normalize-space(string(./text())) != ""]) = 0]

score 2 · Accepted Answer

对于“匹配”使用：

//*[text()[normalize-space()] and not(../text()[normalize-space()])]

对于“可能匹配”，请使用：

//*[../text()[normalize-space()]]

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     <xsl:copy-of select=
      "//*[text()[normalize-space()] and not(../text()[normalize-space()])]"/>
==========
   <xsl:copy-of select="//*[../text()[normalize-space()]]"/>
 </xsl:template>
</xsl:stylesheet>

当对提供的 XML 应用此转换时（包装到单个顶部元素中以成为格式良好的 XML 文档）：

<t>
<table class="dont-match">
    <tr class="dont-match">
        <td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
        <td class="match">Plain text in here.</td>
        <td class="dont-match"><img src="..." /></td>
    </tr>
</table>
<div class="dont-match">
    <div class="dont-match"><img src="..." /></div>
    <div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
    <p class="match">Plain text in here.</p>
</div>
</t>

计算两个 XPath 表达式中的每一个，并将选定的节点复制到输出：

<td class="match">Mixed <strong class="maybe-match">content</strong> in here.</td>
<td class="match">Plain text in here.</td>
<div class="match">Mixed <em class="maybe-match">content</em> in here.</div>
<p class="match">Plain text in here.</p>
==========
   <strong class="maybe-match">content</strong>
<em class="maybe-match">content</em>

正如我们所看到的，这两个表达式都准确地选择了想要的元素。

xslt-1.0 - 选择包含混合内容或仅使用 XPath 的文本的节点

2 回答 2

Related

Reference