xml - XSLT：读取由空标签划分的内容

Question

所以我忙于创建一个 XSLT 文件来将各种 XML 文档处理成一个新的节点布局。

有一件事我想不通，这是我正在使用的 XML 示例：

<page>
   This is a paragraph on the page.
    <newParagraph/>
   This is another paragraph.
    <newParagraph/>
   Here is yet another paragraph on this page.
<page>

如您所见，段落使用空标签作为分隔符进行拆分。在结果 XML 我想要这个：

<page>
   <p>
    This is a paragraph on the page.
   </p>
   <p> 
    This is another paragraph.
   </p>
   <p>
   Here is yet another paragraph on this page.
   </p>
<page>

如何使用 XSLT（仅限 1.0 版）实现这一点？

score 0 · Accepted Answer

以下答案不如@stwissel 优雅，但它会正确标记段落中的任何子树。它确实变得有点讨厌，确实。:-)

此任务的问题在于它需要对结束标记和随后匹配的开始标记（例如<tag></tag>）之间的内容进行特殊处理。然而，XSLT 已针对处理开始标记和匹配结束标记（例如</tag><tag>）之间的内容进行了优化。顺便说一句：有一种方法可以“欺骗”一点。请参阅我对这个问题的其他答案。

假设您有一个输入 XML，如下所示：

<pages>
  <page>
    This is a paragraph on the page.
    <B>bold</B>
    After Bold
    <newParagraph/>
    This is another paragraph.
    <newParagraph/>
    Here is yet another paragraph on this page.
    <EM>
      <B>
        Bold and emphasized.
      </B>
    </EM>
    After bold and emphasized.
  </page>
  <page>
    Another page.
  </page>
</pages>

可以使用此 XSLT 1.0 转换对其进行处理

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

  <xsl:template match="page">
    <page>
      <!-- handle the first paragraph up to the first newParagraph -->
      <P>
        <xsl:apply-templates select="node()[not(preceding-sibling::newParagraph)]" />
      </P>

      <!-- now handle all remaining paragraphs of the page -->
      <xsl:for-each select="newParagraph">
        <xsl:variable name="pCount" select="position()"/>
        <P>
          <xsl:apply-templates select="following-sibling::node()[count(preceding-sibling::newParagraph) &lt;= $pCount]" />
        </P>
      </xsl:for-each>
    </page>
  </xsl:template>

  <!-- this default rule recursively copies all substructures within a paragraph at tag level -->  
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>


  <!-- this default rule makes sure that texts between the tags are printed -->
  <xsl:template match="text()">
    <xsl:copy-of select="."/>
  </xsl:template>

  <xsl:template match="newParagraph"/>

</xsl:stylesheet>

产生这个输出

<pages>
  <page><P>
    This is a paragraph on the page.
    <B>bold</B>
    After Bold
    </P><P>
    This is another paragraph.
    </P><P>
    Here is yet another paragraph on this page.
    <EM>
      <B>
        Bold and emphasized.
      </B>
    </EM>
    After bold and emphasized.
  </P></page>
  <page><P>
    Another page.
  </P></page>
</pages>

score 0 · Accepted Answer

这或多或少是另一个问题的重复，因此相同的方法将起作用：

<xsl:template match="pages">
    <xsl:apply-templates />
</xsl:template>

<xsl:template match="page/text()">
    <p><xsl:value-of select="."/></p>
</xsl:template>

<xsl:template match="NewParagraph" />

简单干净。希望能帮助到你

score 0 · Accepted Answer

如果您愿意“作弊”一点，您可以手动将 XML 标记插入结果文档中，这些标记不是节点树的一部分，而是普通文本。但是，如果下游处理器重新解析输出，则不会注意到差异。

鉴于我的其他答案的输入，以下 XSLT 1.0 转换将起到作用（保留段落中的子树）：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

  <xsl:template match="page">
    <page>
      <P>
        <xsl:apply-templates/>
      </P>
    </page>
  </xsl:template>

  <!-- this default rule recursively copies all substructures within a paragraph at tag level -->  
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>


  <!-- this default rule makes sure that texts between the tags are printed -->
  <xsl:template match="text()">
    <xsl:copy-of select="."/>
  </xsl:template>

  <xsl:template match="newParagraph">
    <!-- This inserts a matching closing and opening tag -->
    <xsl:value-of select="'&lt;/P&gt;&lt;P&gt;'" disable-output-escaping="yes" />
  </xsl:template>

</xsl:stylesheet>

xml - XSLT：读取由空标签划分的内容

3 回答 3

Related

Reference