xml - 如何从 XPath 中的长字符串中选择符合条件的文本

Question

这是一段 XML 文档：

<book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
</book>

我被要求使用 XPath 找出姓氏以大写“C”开头的作者。这个问题很简单，因为只有一个合格的，我可以在空格后使用函数 substring-after() 然后检查它是否以“C”开头。但也有可能这个人的名字很长，因此会出现中间名，比如 Kurt Van Persie Cagle。如何在最后一个空格之后准确地去除子字符串？

请解释和使用 XPath 中的功能。

score 0 · Accepted Answer

我被要求使用 XPath 找出姓氏以大写“C”开头的作者。

通常，这是不可能用单个 XPath 1.0 表达式来选择的。当然，这可以使用 XSLT 1.0 完成。

使用 XPath 2.0：

/*/author[starts-with(tokenize(., ' ')[last()], 'C')]

基于 XSLT 2.0 的验证：

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:sequence select="/*/author[starts-with(tokenize(., ' ')[last()], 'C')]"/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于以下 XML 文档时：

<book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt van Persy Cantor Bagle</author>
    <author>Kurt van Persy Cantor Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
</book>

计算 XPath 表达式并将选定的节点复制到输出：

<author>Kurt van Persy Cantor Cagle</author>

score 0 · Accepted Answer

您可以使用“混乱”XPath，例如，您有 4 个字的限制author：

//author[
    (starts-with(substring-after(., ' '), 'C') and not(contains(substring-after(., ' '), ' ')))
    or
    (starts-with(substring-after(substring-after(., ' '), ' '), 'C') and not(contains(substring-after(substring-after(., ' '), ' '), ' ')))
    or
    (starts-with(substring-after(substring-after(substring-after(., ' '), ' '), ' '), 'C') and not(contains(substring-after(substring-after(substring-after(., ' '), ' '), ' '), ' ')))
]

输入：

<book>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>James Linn</author>
    <author>Kurt Van Persie Cagle</author>
</book>

以上 XPath 将选择 2 个作者：Kurt Cagle和Kurt Van Persie Cagle. 您可以扩展此 XPath 以匹配具有 5 个单词的作者，依此类推... :)

score 0 · Accepted Answer

跟进@DimitreNovaatchev 的出色解决方案，请注意，如果您的解析器能够使用EXSLT 的字符串扩展函数，您可以在 XSLT 1.0 中使用相同的标记化概念。

例如，这个支持 EXSLT 的 XSLT 1.0 解决方案：

<?xml version="1.0"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:str="http://exslt.org/strings"
  exclude-result-prefixes="str"
  version="1.0">
  <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/">
    <xsl:copy-of
      select="/*/author[starts-with(str:tokenize(., ' ')[last()], 'C')]" />
  </xsl:template>

</xsl:stylesheet>

...应用于@Dimitre 修改后的输入 XML 时产生相同的期望结果：

<author>Kurt van Persy Cantor Cagle</author>

xml - 如何从 XPath 中的长字符串中选择符合条件的文本

3 回答 3

Related

Reference