regex - XSLT 是否提供了一种通过使用正则表达式来识别 xml 元素的方法？

Question

我有一个示例 xml 文件，如下所示：

--- before transformation ---
<root-node>

   <child-type-A> ... </child-type-A>
   <child-type-A> ... </child-type-A>
   <child-type-B> ... </child-type-B>
   <child-type-C>
      <child-type-B> ... </child-type-B>
      ...
   </child-type-C>


   ...

</root-node>

我想将此 xml 文件转换为如下所示的内容：

--- after transformation ---
<root-node>

   <child-node> ... </child-node>
   <child-node> ... </child-node>
   <child-node> ... </child-node>
   <child-node>
      <child-node> ... </child-node>
      ...
   </child-node>

   ...

</root-node>

实际上，这意味着文档结构保持不变，但一些“选择的”元素被重命名。这些选择的元素以相同的前缀开头（在此示例中为“child-type-”），但具有不同的后缀（“A”|“B”|“C”|等）。

为什么要这么麻烦？我有一个需要 xml 文件作为输入的软件。为方便起见，我使用 XML 模式轻松编辑 xml 文件，该模式有助于确保 xml 文件正确无误。遗憾的是，XML 模式在上下文敏感性方面有些欠缺。这导致 xml 文件看起来像 /before transformation/ 中所示。该软件无法处理这样的 xml 文件，因为它需要 /after transformation/ 中所示的文件。因此需要进行改造。

我想用 XSLT 进行转换，并且我已经想出了如何去做。我的方法是为身份转换定义一个规则，并为每个需要重命名的“child-type-*”元素定义一个规则。这个解决方案有效，但它不是那么优雅。你最终会有很多规则。

--- sample transformation rules ---

<!-- Identity transformation -->
<xsl:template match="@*|node()">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
   </xsl:copy>
</xsl:template>

<xsl:template match="child-type-A">
   <xsl:element name="child-node">
      <xsl:apply-templates select="@*|node()" />
   </xsl:element>
</xsl:template>

...

有没有办法把它浓缩成两条规则？一个用于身份转换，一个用于所有“child-type-*”元素？也许通过将 XSLT 与一些正则表达式结合使用？还是您必须采取不同的方法来解决此类问题？

score 2 · Accepted Answer

（修改了我的答案）

此代码段适用于您的示例 XML。我合并了这两个模板，因为它们都想作用于“所有元素”。我早期的模板不起作用，因为它们都匹配相同的选择。

<xsl:template match="@*|node()">
    <xsl:choose>
        <xsl:when test="starts-with(name(), 'child-type')">
            <xsl:element name="child-node">
                <xsl:apply-templates select="@*|node()"/>
            </xsl:element>
        </xsl:when>
        <xsl:otherwise>
           <xsl:copy>
              <xsl:apply-templates select="@*|node()" />
           </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

鉴于您的源 XML：

<root-node>
   <child-type-A> ... </child-type-A>
   <child-type-A> ... </child-type-A>
   <child-type-B> ... </child-type-B>
   <child-type-C>
      <child-type-B> ... </child-type-B>
   </child-type-C>
</root-node>

这将产生以下输出：

<root-node>
<child-node> ... </child-node>
<child-node> ... </child-node>
<child-node> ... </child-node>
<child-node>
    <child-node> ... </child-node>
</child-node>
</root-node>

score 1 · Accepted Answer

通过将含义附加到元素名称的内部语法来捕获信息不是一个好主意（在极端情况下，可以有一个 XML 文档，其中所有信息都以根元素的名称捕获，<Surname_Kay.Firstname_Michael.Country_UK/>）。但是，如果您有该表单中的数据，当然可以处理它，例如使用表单的模板规则<xsl:template match="*[matches(name(), 'child-type-[A-Z]')]">

score 1 · Accepted Answer

XSLtT 有一个starts-with函数，可用于识别以'child-type'允许您使用单个模板匹配的开头的元素。请参阅此相关问题：

选择与开头名称匹配的元素

score 0 · Accepted Answer

这是一个通用的 XSLT 1.0 转换，它可以与指定所需前缀的参数一起使用，并且对于每个所需的前缀，后缀集，这样任何具有此前缀和这些后缀之一的元素名称都应使用所需的新名称进行重命名姓名：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:my="my:my" exclude-result-prefixes="my" >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <my:renames>
  <rename prefix="child-type-"
          newVal="child-node">
    <suffix>A</suffix>
    <suffix>B</suffix>
    <suffix>C</suffix>
  </rename>
 </my:renames>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/*//*">
  <xsl:choose>
  <xsl:when test=
   "document('')/*
         /my:renames
           /rename
             [@prefix[starts-with(name(current()),.)]
            and
              suffix
               [substring(name(current()),
                          string-length(name(current()))
                          - string-length(.) +1
                          )
               =
                 .
               ]
              ]
    ">

  <xsl:variable name="vNewName" select=
   "document('')/*
         /my:renames
           /rename
             [@prefix[starts-with(name(current()),.)]
            and
              suffix
               [substring(name(current()),
                          string-length(name(current()))
                          -string-length(.) +1
                          )
               =
                 .
               ]
              ]
              /@newVal
   "/>

      <xsl:element name="{$vNewName}">
       <xsl:apply-templates select="node()|@*"/>
      </xsl:element>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="identity"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时：

<root-node>
    <child-type-A> ... </child-type-A>
    <child-type-A> ... </child-type-A>
    <child-type-B> ... </child-type-B>
    <child-type-C>
      <child-type-B> ... </child-type-B>
      ...
    </child-type-C>
      ...
</root-node>

产生了想要的正确结果：

<root-node>
   <child-node> ... </child-node>
   <child-node> ... </child-node>
   <child-node> ... </child-node>
   <child-node>
      <child-node> ... </child-node>
      ...
    </child-node>
      ...
</root-node>

请注意：使用此转换，您可以同时重命名具有不同前缀的不同元素及其关联的后缀，指定为外部参数/文档。

二、等效的 XSLT 2.0 解决方案：

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vRules">
  <rule prefix="^child\-type\-" newVal="child-node">
    <suffix>A$</suffix>
    <suffix>B$</suffix>
    <suffix>C$</suffix>
  </rule>
 </xsl:variable>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "*[for $n in name(.),
         $r in $vRules/*
                 [matches($n, @prefix)], 
         $s in $vRules/*/suffix
                 [matches($n, .)]
      return $r and $s
    ]">

    <xsl:variable name="vN" select="name()"/>

    <xsl:variable name="vNewName" select=
     "$vRules/*
           [matches($vN, @prefix)
           and 
            suffix[matches($vN, .)]
           ]
           /@newVal
     "/>
   <xsl:element name="{$vNewName}">
    <xsl:apply-templates select="node()|@*"/>
   </xsl:element>
 </xsl:template>
</xsl:stylesheet>

当应用于同一个 XML 文档（上图）时，同样会产生正确的输出。

regex - XSLT 是否提供了一种通过使用正则表达式来识别 xml 元素的方法？

4 回答 4

Related

Reference