xslt - XSLT 如何修剪元素前后的空间，当元素说？

Question

在格式化使用 TEI 标记 (www.tei-c.org) 的文本文档时会出现此问题。这超出了我的 XSLT/XPATH 技能。（需要 XSLT/XPATH 1.0 中的解决方案。）

有一个标记元素，<lb>用于标记换行符。它可以带一个属性@break。如果，则在生成输出时应忽略文本和周围文本@break="no"之间的任何空格。<lb>

所以

This little tea <lb break="no" />
pot, short and stout.

应该理解为

This little teapot, short and stout.

也就是说，“tea”之后的空格和“pot”之前的换行符不应在输出流中呈现。

对于之前的空间<lb>，这可以工作：

<xsl:template match="text()[following-sibling::*[1][self::lb[@break='no']]">
    <!-- Do something about the space here. -->
</xsl:template>

类似的东西适用于<lb>.

好的。但这更棘手：

This <emph>little <ref>tea </ref> </emph>
<lb break="no" />
pot, short and stout.

现在元素内的文本<ref>不是<lb>. 并且之前的空格、之前</ref>的空格</emph>和之前和之后的换行符<lb>都需要从输出流中删除。

如何？

score 3 · Accepted Answer

这是一个经过测试的工作实现，包括如何从文本节点的右侧或左侧修剪空白：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Match if the preceding node (not necessarily sibling) that is either
      a non-empty-space-text node or an <lb> is an <lb break='no'> -->
    <xsl:template match="text()[
        (preceding::node()[
            self::text()[normalize-space() != ''] or
            self::lb])
                [last()]
        [self::lb[@break='no']]
        ]">

        <!-- Trim whitespace on the left. Thanks to Alejandro,
            http://stackoverflow.com/a/3997107/423105 -->
        <xsl:variable name="firstNonSpace"
            select="substring(normalize-space(), 1, 1)"/>
        <xsl:value-of select="concat($firstNonSpace,
            substring-after(., $firstNonSpace))"/>
    </xsl:template>

    <!-- Match if the next node (not necessarily sibling) that is either
      a non-empty-space-text node or an <lb> is an <lb break='no'> -->
    <xsl:template match="text()[
        following::node()[
            self::text()[normalize-space() != ''] or
            self::lb]
               [1]
        [self::lb[@break='no']]
        ]">

        <xsl:variable name="normalized" select="normalize-space()"/>
        <xsl:if test="$normalized != ''">
            <xsl:variable name="lastNonSpace"
                select="substring($normalized, string-length($normalized))"/>
            <xsl:variable name="trimmedSuffix">
                <xsl:call-template name="substring-after-last">
                    <xsl:with-param name="string" select="."/>
                    <xsl:with-param name="delimiter" select="$lastNonSpace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of select="substring(., 1, string-length(.) -
               string-length($trimmedSuffix))"/>
        </xsl:if>
        <!-- otherwise output nothing. -->
    </xsl:template>


    <!-- Thanks to Jeni Tennison:
        http://www.stylusstudio.com/xsllist/200111/post00460.html -->
    <xsl:template name="substring-after-last">
        <xsl:param name="string" />
        <xsl:param name="delimiter" />
        <xsl:choose>
            <xsl:when test="contains($string, $delimiter)">
                <xsl:call-template name="substring-after-last">
                    <xsl:with-param name="string"
                        select="substring-after($string, $delimiter)" />
                    <xsl:with-param name="delimiter" select="$delimiter" />
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise><xsl:value-of select="$string" /></xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

我在这里的假设是，在回答我上面的“下一个歧义”评论之前，如果有一个没有<lb>的元素，则在它充当忽略空白的边界的意义上构成“周围文本”。 break="no"<lb>

样本输入：

<test>
    <t1>
        This <emph>little <ref>tea </ref> </emph>
        <lb break="no" />
        pot, short and stout.        
    </t1>    
    <t2>
        This <emph>little <ref>tea </ref> </emph>
        <lb />
        <lb break="no" />
        pot, short and stout.        
    </t2>    
</test>

输出：

<test>
    <t1>
        This <emph>little <ref>tea</ref></emph><lb break="no"/>pot, short and stout.        
    </t1>    
    <t2>
        This <emph>little <ref>tea </ref> </emph>
        <lb/><lb break="no"/>pot, short and stout.        
    </t2>    
</test>

此输出是正确的 AFAICT。如果没有，请告诉我原因，我会看看如何修复它。

score 1 · Accepted Answer

~~尝试如下选择器：~~

~~text()[matches(., '\S?\s*$') and not following::text()[matches('\S')] and following::lb[@break="no"]]~~

~~当然，这是可怕且低效的。但可能工作。~~不起作用，因为正如已经指出的那样，您没有matches()。我再来一次：

好的，我们正在寻找四种不同的场景：

第一个前面的非空文本元素，如果它以空格结尾：

lb[@break='no']/preceding::text()[normalize-space()!='' and string-length(substring-after(.,normalize-space()))!=0][1 ]
跟在前面的第一个非空文本元素之后的空文本元素：

lb[@break='no']/preceding::text()[normalize-space()='' 和preceding::text()[normalize-space()!='']]
以下第一个非空文本元素之前的空文本元素：

lb[@break='no']/following::text()[normalize-space()!='' and string-length(substring-before(.,normalize-space()))!=0][1 ]
第一个非空文本元素，如果它以空格开头：

lb[@break='no']/following::text()[normalize-space()='' and following::text()[normalize-space()!='']]

因为您不能在 xpath 1.0 中使用 union，所以您必须使用此方法从上述每个匹配项中调用一个模板。

xslt - XSLT 如何修剪元素前后的空间，当元素说？

2 回答 2

样本输入：

输出：

Related

Reference