xml - Complex xPath query

Question

I need to write a quite complex XSLT 1.0 query.

Given the following XML file, I need a query to get the set of authors who are in multiple reports. (for example Antonio Rossi, because he's both on report 1 and 2).

<reports>
  <report id="01">
    <titolo>
      I venti del Nord
    </titolo>
    <autori>
      <autore>
        Antonio Rossi
      </autore>
      <autore>
        Mario Verdi
      </autore>
    </autori>
    <versioni>
      <versione numero="1.0">
        <data>
          13-08-1980
        </data>
        <autore>
          Mario Verdi
        </autore>
        <commento>
          versione iniziale
        </commento>
      </versione>
      <versione numero="2.0">
        <data>
          14-08-1981
        </data>
        <autore>
          Antonio Rossi
        </autore>
        <commento>
          poche modifiche
        </commento>
      </versione>
    </versioni>
  </report>
  <report id="02">
    <titolo>
      Le pioggie del Nord
    </titolo>
    <autori>
      <autore>
        Antonio Rossi
      </autore>
      <autore>
        Luca Bianchi
      </autore>
    </autori>
    <versioni>
      <versione numero="1.0">
        <data>
          13-12-1991
        </data>
        <autore>
          Antonio Rossi
        </autore>
        <commento>
          versione iniziale
        </commento>
      </versione>
      <versione numero="2.0">
        <data>
          14-08-1992
        </data>
        <autore>
          Antonio Rossi
        </autore>
        <commento>
          modifiche al cap. 1
        </commento>
      </versione>
      <versione numero="3.0">
        <data>
          18-08-1992
        </data>
        <autore>
          Antonio Rossi
        </autore>
        <commento>
          Aggiunta intro.
        </commento>
      </versione>
      <versione numero="4.0">
        <data>
          13-01-1992
        </data>
        <autore>
          Luca Bianchi
        </autore>
        <commento>
          Modifiche sostanziali.
        </commento>
      </versione>
    </versioni>
  </report>
  <report id="03">
    <titolo>
      Precipitazioni nevose
    </titolo>
    <autori>
      <autore>
        Fabio Verdi
      </autore>
      <autore>
        Luca Bianchi
      </autore>
    </autori>
    <versioni>
      <versione numero="1.0">
        <data>
          11-01-1992
        </data>
        <autore>
          Fabio Verdi
        </autore>
        <commento>
          versione iniziale
        </commento>
      </versione>
      <versione numero="2.0">
        <data>
          13-01-1992
        </data>
        <autore>
          Luca Bianchi
        </autore>
        <commento>
          Aggiornato indice
        </commento>
      </versione>
    </versioni>
  </report>
</reports>

score 5 · Accepted Answer

如果您可以使用 XPath 2.0，您可以使用：

distinct-values(/reports/report/autori/autore[preceding::report/autori/autore = . or following::report/autori/autore = .])

使用您的输入 XML，它将返回：

Antonio Rossi
Luca Bianchi

score 3 · Accepted Answer

I. 这个简单的（无 for-each，无变量）XSLT 1.0 转换：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:key name="kAuthorByVal" match="autori/autore" use="normalize-space()"/>

  <xsl:template match="/">
   <xsl:copy-of select=
    "//autori/autore
                  [generate-id()
                  =
                   generate-id(key('kAuthorByVal', normalize-space())[1])
                   ]
                  [key('kAuthorByVal', normalize-space())[2]]"/>
  </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时：

<reports>
      <report id="01">
        <titolo>
          I venti del Nord
        </titolo>
        <autori>
          <autore>
            Antonio Rossi
          </autore>
          <autore>
            Mario Verdi
          </autore>
        </autori>
        <versioni>
          <versione numero="1.0">
            <data>
              13-08-1980
            </data>
            <autore>
              Mario Verdi
            </autore>
            <commento>
              versione iniziale
            </commento>
          </versione>
          <versione numero="2.0">
            <data>
              14-08-1981
            </data>
            <autore>
              Antonio Rossi
            </autore>
            <commento>
              poche modifiche
            </commento>
          </versione>
        </versioni>
      </report>
      <report id="02">
        <titolo>
          Le pioggie del Nord
        </titolo>
        <autori>
          <autore>
            Antonio Rossi
          </autore>
          <autore>
            Luca Bianchi
          </autore>
        </autori>
        <versioni>
          <versione numero="1.0">
            <data>
              13-12-1991
            </data>
            <autore>
              Antonio Rossi
            </autore>
            <commento>
              versione iniziale
            </commento>
          </versione>
          <versione numero="2.0">
            <data>
              14-08-1992
            </data>
            <autore>
              Antonio Rossi
            </autore>
            <commento>
              modifiche al cap. 1
            </commento>
          </versione>
          <versione numero="3.0">
            <data>
              18-08-1992
            </data>
            <autore>
              Antonio Rossi
            </autore>
            <commento>
              Aggiunta intro.
            </commento>
          </versione>
          <versione numero="4.0">
            <data>
              13-01-1992
            </data>
            <autore>
              Luca Bianchi
            </autore>
            <commento>
              Modifiche sostanziali.
            </commento>
          </versione>
        </versioni>
      </report>
      <report id="03">
        <titolo>
          Precipitazioni nevose
        </titolo>
        <autori>
          <autore>
            Fabio Verdi
          </autore>
          <autore>
            Luca Bianchi
          </autore>
        </autori>
        <versioni>
          <versione numero="1.0">
            <data>
              11-01-1992
            </data>
            <autore>
              Fabio Verdi
            </autore>
            <commento>
              versione iniziale
            </commento>
          </versione>
          <versione numero="2.0">
            <data>
              13-01-1992
            </data>
            <autore>
              Luca Bianchi
            </autore>
            <commento>
              Aggiornato indice
            </commento>
          </versione>
        </versioni>
      </report>
</reports>

产生想要的正确结果：

<autore>
            Antonio Rossi
          </autore>
<autore>
            Luca Bianchi
          </autore>

说明：

一个关键的观察是，autori/autore一个特定的字符串值不能在report. 这大大简化了解决方案（对于更复杂的解决方案，请查看此答案的早期版本）。此考虑在此答案中提出的所有解决方案中都基本使用。
我们定义了一个键，autori/autore通过它的规范化字符串值来标识一个。因此，两个autori/autore具有不同空格但呈现同一作者的两个被视为同一作者的实例。
使用 Muenchian 分组方法，我们选择所有autori/autore元素的集合，每个元素都有一个不同的标准化字符串值。
对于每个选择autori/autore的具有唯一规范化字符串值的此类，我们还测试是否存在第二个autori/autore具有相同规范化字符串值的此类。我们选择所有这样autori/autore的元素，这个节点集正是这个问题需要选择的。

二、XSLT 2.0 解决方案：

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>

 <xsl:variable name="vSeq" select="//autori/autore/normalize-space()"/>
 <xsl:template match="/">
     <xsl:value-of select="$vSeq[index-of($vSeq,.)[2]]" separator="&#xA;"/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于同一个 XML 文档（如上）时，会产生所需的正确结果：

Antonio Rossi
Luca Bianchi

说明：

在这里，我们使用这个答案并相应地定义$vSeq。

三、单个 XPath 3.0（和 XQuery 3.0）表达式 - 解决方案：

let $vSeq := //autori/autore/normalize-space()
 return
    $vSeq[index-of($vSeq,.)[2]]

score 3 · Accepted Answer

这甚至在 XPath 1.0 中也有效：

//report//autore[text()=../../following-sibling::report//autore/text()]

它也选择autore具有文本内容等于以下任何autore节点中的任何节点的所有report节点。

或者，为了简短起见，如果您的真实 xml 文件中没有什么真正棘手的问题，即使这也应该起作用：

//autore[text()=../../following-sibling::*//autore/text()]

编辑：意外工作。请参阅下面的评论。

score 2 · Accepted Answer

恭喜 DevNull 获得了当时发布的第一个正确答案。在他发帖时，还不知道 OP 想要 XSLT 1.0 解决方案。我在下面提供一个。

以任何有效的方式在 XSLT 1.0 中获取不同的值都需要 Muenchian 分组。以下是在 XSLT 1.0 中如何做到这一点...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />

<xsl:key name="kAuthors" match="autori/autore" use="normalize-space()" />

<xsl:template match="/">
The set of authors on multiple reports
====================================== 
<xsl:for-each select="reports/report/autori/autore[
   generate-id()=
   generate-id( key('kAuthors',normalize-space())[1])]">
  <xsl:variable name="author" select="normalize-space()" />   
  <xsl:for-each select="key('kAuthors',$author)[2]">
   <xsl:value-of select="concat($author,'&#x0A;')" /> 
  </xsl:for-each>
 </xsl:for-each>  
</xsl:template>

</xsl:stylesheet>

将上述样式表应用于 OP 的示例数据时，会生成此文本文档...

The set of authors on multiple reports
====================================== 
Antonio Rossi
Luca Bianchi

解释

在每份报告中，作者都出现了两次。一次在 autori 下，再次在 versione 下。我们不需要重复计算每个报告，因此我们为关键 autori/autore 创建匹配模式。键值是作为字符串的作者姓名。因此，关键组作者。

我们使用标准的 Muenchian 分组来遍历作者。这是外部的 for-each。现在我们只对“惯犯”感兴趣。我们可以通过将 [2] 谓词应用于内部循环来实现这一点。最多只出现在 1 个报告中的作者将被过滤掉，因为他们的组的长度只有 1。

xml - Complex xPath query

4 回答 4

解释

Related

Reference