0

我正在完成我的第一个主要 XSLT 项目,并且有点新手,所以请耐心等待我的无知。

我们的小组正在努力将现有的 XML 转换为完全不同的标记系统。我设计了一个使用分析字符串处理 MathType 标注(由 "${TEXT}" 表示)的系统,但我很难确定我应该如何处理像 ital 标签这样的代码(由 "I" 标签表示),需要保存在结果代码中。

我尝试在非匹配子字符串中使用副本,但这似乎不起作用。当然, value-of 让我得到了除了 ital 标签之外的所有东西。

我意识到此时变量 ($stemString) 是多余的。我沿着这条路走,以为我可能会想出一些允许复制处理的东西,但到目前为止,还没有运气。

示例代码:

<stem>What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}</stem>

我当前的 XSLT:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="assessmentItem">

<!--SNIP-->

    <xsl:apply-templates select="stemArea/stem"/>

<!--SNIP-->

</xsl:template>

<xsl:template match="stem">

    <xsl:variable name="stemString">
        <xsl:copy-of select="./* | ./text()"/>
    </xsl:variable>

    <xsl:choose>

        <!--Tests for empty stems that aren't art callouts-->
        <xsl:when test=". = '' and @type!='art'"></xsl:when>
        <xsl:when test=". = ' ' and @type!='art'"></xsl:when>

        <!--Test for art callouts-->
        <xsl:when test="@type='art'"><p><img alt="{@loc}" height="10" id="{@loc}" label="" longdesc="normal" src="{@loc}" width="10"/></p></xsl:when>

        <!--Test for boxed text-->
        <xsl:when test="@style='box' or @style='boxL'"><p><span label="Tag_7">
            <xsl:copy-of select="./* | ./text()"></xsl:copy-of>
        </span></p></xsl:when>

        <xsl:otherwise><p>

            <!--Are MathType tokens present in stem?-->
            <xsl:analyze-string regex="(\$\{{.+\}})" select="$stemString">

                <!--If MathType tokens are in stem, do the following-->
                <xsl:matching-substring>

                    <xsl:analyze-string regex="(\$\{{)(##.+[eqn|art]\d+)([^a-zA-Z0-9]?.*\}})" select=".">
                        <xsl:matching-substring>
                            <img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
                        </xsl:matching-substring>
                        <xsl:non-matching-substring>
                            <xsl:text>ERROR</xsl:text>
                        </xsl:non-matching-substring>
                    </xsl:analyze-string>

                </xsl:matching-substring>

                <!--No MathType tokens in string-->
                <xsl:non-matching-substring>
                    <xsl:value-of select="."/>
                </xsl:non-matching-substring>

            </xsl:analyze-string>
        </p></xsl:otherwise>

    </xsl:choose>

</xsl:template>

期望的输出:

<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>

我得到了什么:

<p>What is the value of f(x) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>

有人对如何进行有任何想法吗?

@Martin Honnen:感谢您的回复。您的代码解决了错误。

但是,我还有一个问题。当词干中有多个 MathType 标注时,会导致错误。我确信原因是我的正则表达式没有正确捕获所有内容,但是我已经对此进行了一段时间的努力,但无济于事。下面我将说明我遇到的问题。

示例代码:

<stem type="text">What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}, and ${##A112800eqn02:3} is 3.</stem>

期望的输出:

<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/>, and <img alt="##A112800eqn02" height="10" id="##A112800eqn02" label="" longdesc="normal" src="##A112800eqn02" width="10"/> is 3.</p>

我得到了什么:

<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01:3}, and ${##A112800eqn02" height="10" id="##A112800eqn01:3}, and ${##A112800eqn02" label="" longdesc="normal" src="##A112800eqn01:3}, and ${##A112800eqn02" width="10"/> is 3.</p>
4

1 回答 1

2

Don't match on an element and then put xsl:choose inside of the template to distinguish further, instead simply write templates for the different elements or elements with certain attribute values.

And if you want to use analyze-string then do that in a template of a text node, not in the template of an element containing mixed content:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="assessmentItem">

<!--SNIP-->

    <xsl:apply-templates select="stemArea/stem"/>

<!--SNIP-->

</xsl:template>

<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>

<xsl:template match="stem[@style='box' or @style='boxL']">
  <p><span label="Tag_7"><xsl:apply-templates/></span></p>
</xsl:template>

<xsl:template match="stem[.//text()[matches(., '\$\{.+\}')]]">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

<xsl:template match="stem//text()[matches(., '\$\{.+\}')]">
  <xsl:analyze-string regex="(\$\{{)(##.+[eqn|art]\d+)([^a-zA-Z0-9]?.*\}})" select=".">
    <xsl:matching-substring>
      <img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
        <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

</xsl:stylesheet>

With that stylesheet, when applied to the input

<stem>What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}</stem>

I get the result

<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>

The above is meant as a suggestion on how to approach your stylesheet design, it is likely not a complete solution as I don't have much input samples to test and don't know the input XML and text format you are trying to process.

I would probably implement

<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>

as

<xsl:template match="stem[not(normalize-space()) and @type!='art']"/>

instead but I have mainly tried to show how to structure the stylesheet with templates and how to match on a descendant text node of stem to ensure the analyze-string does not swallow elements nodes inside stem.

As for your edited input requirement, I have changed the regular expression to use non-greedy matching (.*?), so with the code below you should be able to match on several patterns in a stem to create several img elements:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="assessmentItem">

<!--SNIP-->

    <xsl:apply-templates select="stemArea/stem"/>

<!--SNIP-->

</xsl:template>

<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>

<xsl:template match="stem[@style='box' or @style='boxL']">
  <p><span label="Tag_7"><xsl:apply-templates/></span></p>
</xsl:template>

<xsl:template match="stem[.//text()[matches(., '\$\{.+?\}')]]">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

<xsl:template match="stem//text()[matches(., '\$\{.+?\}')]">
  <xsl:analyze-string regex="(\$\{{)(##.+?[eqn|art]\d+)([^a-zA-Z0-9]?.*?\}})" select=".">
    <xsl:matching-substring>
      <img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
        <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

</xsl:stylesheet>
于 2013-05-24T09:11:13.970 回答