1

我有这个 xml 文件:

<?xml version="1.0" encoding="iso-8859-1"?>
<doclist>
<text attribute="a">This is a <tag1>sentence</tag1> <tag1>with</tag1> a few            
<tag1>words</tag1>.</text>
<-- many more text nodes with none, one or several '<tag1>' in it -->
</doclist>

我想得到这个结果:

<?xml version="1.0" encoding="iso-8859-1"?>
<doclist>
<text attribute="a">This is a <tag1>sentence with</tag1> a few <tag1>words</tag1>. 
</text>
<-- many more text nodes with none, one or several '<tag1>'s in it -->
</doclist>

我试过用正则表达式来做:

<xsl:template match="text">
<text>
<xsl:apply-templates select="@*"/> <!-- templ. to copy attributes of text -->
<xsl:analyze-string select="." 
regex="&lt;tag1>(.+)&lt;tag1>&lt;tag1>(.+)&lt;/tag1>">
<!-- also tried . instead of &lt; -->
<xsl:matching-substring>
<xsl:for-each select=".">
<tag1>
<xsl:value-of-select="regex-group(1)">
<xsl:text> <xsl:text>
<xsl:value-of-select="regex-group(2)">
</tag1>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:for each select=".">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:non-matching-substring>
</xsl:analyze-string>
</text>
</xsl:template>

但我的输出如下所示:

<?xml version="1.0" encoding="iso-8859-1"?>
<doclist>
<text attribute="a>This is a sentencewitha few words. 
</text>
<-- many more text nodes with none, one or several '<tag1>'s in it -->
</doclist>

我的猜测是,没有找到匹配项,因为<tag1>结果中没有出现 - 但我不明白为什么只有标签包围的单词会丢失它们的空格......我怎样才能正确折叠<tag1>直接邻居的 s?

4

1 回答 1

1

如果for-each-group group-adjacent需要对节点进行操作(元素节点和文本节点的混合内容)使用,则不能用于analyze-string对元素节点进行操作。

所以我认为

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="text">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:for-each-group select="node()" group-adjacent="self::tag1 or self::text()[not(normalize-space())]">
      <xsl:choose>
        <xsl:when test="current-grouping-key()">
          <tag1>
            <xsl:apply-templates select="current-group()"/>
          </tag1>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="current-group()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

<xsl:template match="text/tag1">
  <xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>

应该这样做,该样式表在与 Saxon 9 一起应用时会转换输入

<doclist>
<text attribute="a">This is a <tag1>sentence</tag1> <tag1>with</tag1> a few            
<tag1>words</tag1>.</text>
<!-- many more text nodes with none, one or several '<tag1>' in it -->
</doclist>

进入结果

<doclist>
<text attribute="a">This is a <tag1>sentence with</tag1> a few
<tag1>words</tag1>.</text>
<!-- many more text nodes with none, one or several '<tag1>' in it -->
</doclist>

我认为这种方法应该适用于更复杂的输入样本。但是请测试自己并报告,如果有问题,然后将更复杂的输入样本添加到问题中,以便我们进行测试。

于 2013-07-03T12:24:57.330 回答