xslt - 返回一组元素的页码（针对 XSLT 1.0/msxsl 的优化）

Question

这不是“我如何做 xxx”，而是“我如何以最佳方式做 xxx？” （真的希望挑战漂浮 Dimitre 的船......）

由于 XSL 处理器的限制（msxsl - 基本上是带有 node-set()、replaces() 和 matches() 扩展函数集的 XSLT 1.0），以下所有内容都变得复杂了。

我正在从书中的某些元素中生成一些元数据 - 比如说章节和 div[title] 元素（为了简化我们的数据模型）。

书中的页码由混合文本节点中的处理指令给出，可能如下所示：

<?Page pageId="256"?>

我的元素需要关联的页码要么是第一个后代（如果分页符本质上是一章内的第一条内容（即章节从一个新页面开始）），或者前面的第一个::processing-instruction('Page')。

让我们组成一个示例文档：

<?xml version="1.0" encoding="UTF-8"?>
<book>
    <chapter>
        <title><?Page pageId="1"?>Chapter I</title>
        <div>
            <p>Introduction to Chapter</p>
            <p>Second paragraph <?Page pageId="2"?>of introduction</p>
        </div>
        <div>
            <title>Section I</title>
            <p>A paragraph</p>
            <p>Another paragraph<?Page pageID="3"?></p>
        </div>
    </chapter>
    <chapter>
        <title><?Page pageId="4"?>Chapter II</title>
        <div>
            <p>Introduction to Chapter</p>
            <p>...</p>
        </div>
    </chapter>
</book>

（请注意，虽然这里的每一章都从一个新页面开始，但我们通常不能保证这一点。在第 1 章的末尾有一个空白页，这是我们常见的）。

我想得到一些这样的信息（我对 XSLT 基础知识很好，我们有兴趣选择页码）：

<meta>
    <meta>
        <field type="title">Chapter I</field>
        <field type="page">1</field>
        <meta>
            <field type="title">Section I</field>
            <field type="page">2</field>
        </meta>
    </meta>
    <meta>
        <field type="title">Chapter II</field>
        <field type="page">4</field>
    </meta>
</meta>

我可以使用 xsl:when 语句和后代轴来做各种事情来决定哪个页码是合适的，但我更喜欢在处理指令上做一些巧妙的匹配，因为目前在大书上使用后代轴正在做事情太慢而无法使用。键会很好，但是由于在 @use 或 @match 属性中既不能使用变量，也不能使用其他键（同样不能使用序列构造函数），事情变得更加复杂。

目前，我有兴趣为其查找页码的元素是在一个键中定义的（现实世界的数据要复杂得多），如下所示：

<xsl:key name="auth" match="chapter|div[title]" use="generate-id()"/>

任何建议或指点都感激不尽！

score 1 · Accepted Answer

这是一个使用密钥的解决方案，它可能是有效的：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kPage"
   match="chapter/title/processing-instruction('Page')"
   use="generate-id(..)"/>

 <xsl:key name="kPage"
   match="processing-instruction('Page')"
   use="generate-id(following::div[title][1]/title)"/>

 <xsl:template match="*">
  <xsl:apply-templates select=
   "*[1]|following-sibling::*[1]"/>
 </xsl:template>

 <xsl:template match="chapter/title[1] | div/title[1]">
  <meta>
    <field type="title"><xsl:value-of select="."/></field>
    <field type="page">
      <xsl:variable name="vPiText"
           select="key('kPage', generate-id())[last()]"/>
      <xsl:value-of select=
      "translate($vPiText,
                 translate($vPiText, '01234567890', ''),
                 ''
                 )"/>
    </field>

    <xsl:apply-templates select="*[1]|following-sibling::*[1]"/>
  </meta>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时：

<book>
    <chapter>
        <title>
            <?Page pageId="1"?>Chapter I</title>
        <div>
            <p>Introduction to Chapter</p>
            <p>Second paragraph 
                <?Page pageId="2"?>of introduction</p>
        </div>
        <div>
            <title>Section I</title>
            <p>A paragraph</p>
            <p>Another paragraph
                <?Page pageID="3"?></p>
        </div>
    </chapter>
    <chapter>
        <title>
            <?Page pageId="4"?>Chapter II</title>
        <div>
            <p>Introduction to Chapter</p>
            <p>...</p>
        </div>
    </chapter>
</book>

产生了想要的正确结果：

<meta>
   <field type="title">Chapter I</field>
   <field type="page">1</field>
   <meta>
      <field type="title">Section I</field>
      <field type="page">2</field>
   </meta>
</meta>
<meta>
   <field type="title">Chapter II</field>
   <field type="page">4</field>
</meta>

xslt - 返回一组元素的页码（针对 XSLT 1.0/msxsl 的优化）

1 回答 1

Related

Reference