2

我有一个使用 TEI 构建的 XML 文件,如下所示:

<div type="chapter" n="1">
        <p>
          <s xml:id="e_1">sentence e1.</s>
          <s xml:id="f_1">sentence f1</s>
        </p>
        <p>
            <s xml:id="e_2"> sentence e2</s>
            <s xml:id="f_2"> sentence f2</s>
        </p>
</div>

<div type="chapter" n="2">
        <!-- -->
</div>

我需要将其转换为这种 HTML 结构:

<div>
<h1>Chapter 1</h1>
<div class="book-content">
 <p>
    <span class='source-language-sent' data-source-id='1'>sentence e1.</span>
    <span id='1' style='display:none'>sentence f1</span>
 </p>
 <p>
    <span class='source-language-sent' data-source-id='2'>sentence e2</span>
    <span id='2' style='display:none'>sentence f2</span>
 </p>
</div>
</div>
<div>
<h1>Chapter 2</h1>
<div class="book-content">
  <!-- -->
</div>
</div>

现在我使用这个 XSLT 文件:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" version="1.0">
   <xsl:output method="html" encoding="UTF-8" indent="yes" />

   <xsl:template match="tei:body">
      <xsl:apply-templates />
   </xsl:template>

   <xsl:template match="tei:teiHeader">
      <xsl:comment>
         <xsl:apply-templates select="node()" />
      </xsl:comment>
   </xsl:template>

   <!--create chapter-->
   <xsl:template match="tei:div">
      <xsl:element name="div">
         <xsl:element name="div">
            <xsl:attribute name="class">
               <xsl:text>book-content</xsl:text>
            </xsl:attribute>
            <xsl:element name="h1">
               <xsl:text>Chapter</xsl:text>
               <xsl:value-of select="@n" />
            </xsl:element>
            <xsl:apply-templates select="node()" />
         </xsl:element>
      </xsl:element>
   </xsl:template>

   <!-- create p-->
   <xsl:template match="tei:p">
      <xsl:element name="p">
         <xsl:apply-templates />
      </xsl:element>
   </xsl:template>

   <!-- create s-->
   <xsl:template match="tei:s">
      <xsl:variable name="xmlid" select="@xml:id" />
      <xsl:if test="starts-with($xmlid, 'e')">
         <xsl:element name="span">
            <xsl:attribute name="class">
               <xsl:text>source-language-sent</xsl:text>
            </xsl:attribute>
            <xsl:attribute name="data-source-id">
               <xsl:value-of select="substring($xmlid, 3, 4)" />
            </xsl:attribute>
            <xsl:apply-templates select="node()" />
         </xsl:element>
      </xsl:if>
      <xsl:if test="starts-with($xmlid, 'f')">
         <xsl:element name="span">
            <xsl:attribute name="style">
               <xsl:text>display:none</xsl:text>
            </xsl:attribute>
            <xsl:attribute name="id">
               <xsl:value-of select="substring($xmlid, 3, 4)" />
            </xsl:attribute>
            <xsl:apply-templates select="node()" />
         </xsl:element>
      </xsl:if>
   </xsl:template>

</xsl:stylesheet>

我的问题是我需要创建一个新的<div class="book-content">foreach 900 个字符。但我不想削减我的元素,所以我需要计算我必须在一个元素中包含s多少个元素才能拥有大约 900 个字符。s<div class="book-content">

4

1 回答 1

2

这是一个有趣的问题,但是您的示例中发生了太多其他事情。我更喜欢用我自己的例子单独解决这个问题。

考虑以下输入:

XML

<book>
    <chapter id="A">
        <para>
            <sentence id="1" length="23">Mary had a little lamb,</sentence>
            <sentence id="2" length="29">His fleece was white as snow,</sentence>
            <sentence id="3" length="30">And everywhere that Mary went,</sentence>
        </para>
        <para>
            <sentence id="4" length="24">The lamb was sure to go.</sentence>
            <sentence id="5" length="34">He followed her to school one day,</sentence>
        </para>
        <para>
            <sentence id="6" length="27">Which was against the rule,</sentence>
            <sentence id="7" length="35">It made the children laugh and play</sentence>
            <sentence id="8" length="24">To see a lamb at school.</sentence>
        </para>
        <para>
            <sentence id="9" length="34">And so the teacher turned it out, </sentence>
            <sentence id="10" length="27">But still it lingered near.</sentence>
        </para>
    </chapter>
    <chapter id="B">
        <para>
            <sentence id="11" length="35">Summertime, and the livin' is easy.</sentence>
            <sentence id="12" length="40">Fish are jumpin' and the cotton is high.</sentence>
            <sentence id="13" length="52">Oh, Your daddy's rich and your mamma's good lookin'.</sentence>
            <sentence id="14" length="35">So hush little baby, don't you cry.</sentence>
            <sentence id="15" length="54">One of these mornings you're going to rise up singing.</sentence>
        </para>
        <para>
            <sentence id="16" length="57">Then you'll spread your wings and you'll take to the sky.</sentence>
            <sentence id="17" length="35">So hush little baby, don't you cry.</sentence>
        </para>
    </chapter>
</book>

length数值仅供参考;我们不会在解决方案中使用它们。

我们的任务是将总长度超过 200 个字符的每一章分成几个章节,只移动整个句子,同时保留句子组之间的原始段落边界。

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="exsl set">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="chapter">
    <xsl:call-template name="split-chapter">
        <xsl:with-param name="nodes" select="para/sentence"/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="split-chapter">
    <xsl:param name="nodes"/>
    <xsl:param name="limit" select="200"/>
    <xsl:param name="remaining-nodes" select="dummy-node"/>
    <!-- 1. Calculate the total length of nodes -->
    <xsl:variable name="lengths">
        <xsl:for-each select="$nodes">
            <length>
                <xsl:value-of select="string-length()" />
            </length>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="total-length" select="sum(exsl:node-set($lengths)/length)" />
    <!-- 2. Process the chapter: -->
    <xsl:choose>
        <!-- If chapter is too long and can be shortened ... -->
        <xsl:when test="$total-length > $limit and count($nodes) > 1">
            <!-- ... try again with one node less. -->
            <xsl:call-template name="split-chapter">
                <xsl:with-param name="nodes" select="$nodes[not(position()=last())]"/>
                <xsl:with-param name="remaining-nodes" select="$remaining-nodes | $nodes[last()]"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <!-- Otherwise create a chapter with the current nodes ... -->
            <chapter id="{@id}" length="{$total-length}" >
                <!-- ... list the paras participating in this chapter ... -->
                <xsl:for-each select="$nodes/parent::para">
                    <para>
                        <!-- ... and process the nodes still left in each para. -->
                        <xsl:apply-templates select="set:intersection(sentence, $nodes)"/>
                    </para>
                </xsl:for-each>
            </chapter>
            <!-- Then process any remaining nodes. -->
            <xsl:if test="$remaining-nodes">
                <xsl:call-template name="split-chapter">
                    <xsl:with-param name="nodes" select="$remaining-nodes"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

结果

<?xml version="1.0" encoding="utf-8"?>
<book>
   <chapter id="A" length="167">
      <para>
         <sentence id="1" length="23">Mary had a little lamb,</sentence>
         <sentence id="2" length="29">His fleece was white as snow,</sentence>
         <sentence id="3" length="30">And everywhere that Mary went,</sentence>
      </para>
      <para>
         <sentence id="4" length="24">The lamb was sure to go.</sentence>
         <sentence id="5" length="34">He followed her to school one day,</sentence>
      </para>
      <para>
         <sentence id="6" length="27">Which was against the rule,</sentence>
      </para>
   </chapter>
   <chapter id="A" length="120">
      <para>
         <sentence id="7" length="35">It made the children laugh and play</sentence>
         <sentence id="8" length="24">To see a lamb at school.</sentence>
      </para>
      <para>
         <sentence id="9" length="34">And so the teacher turned it out, </sentence>
         <sentence id="10" length="27">But still it lingered near.</sentence>
      </para>
   </chapter>
   <chapter id="B" length="162">
      <para>
         <sentence id="11" length="35">Summertime, and the livin' is easy.</sentence>
         <sentence id="12" length="40">Fish are jumpin' and the cotton is high.</sentence>
         <sentence id="13" length="52">Oh, Your daddy's rich and your mamma's good lookin'.</sentence>
         <sentence id="14" length="35">So hush little baby, don't you cry.</sentence>
      </para>
   </chapter>
   <chapter id="B" length="146">
      <para>
         <sentence id="15" length="54">One of these mornings you're going to rise up singing.</sentence>
      </para>
      <para>
         <sentence id="16" length="57">Then you'll spread your wings and you'll take to the sky.</sentence>
         <sentence id="17" length="35">So hush little baby, don't you cry.</sentence>
      </para>
   </chapter>
</book>
于 2015-04-25T16:38:23.160 回答