0

我的来源是:

<content>
  <caption>text 1</caption>
  <element1>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
  <section1>
     <element2>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>
   </section1>
 </content>

我正在尝试为同时具有子(字符元素)和文本的元素(它可能是任何元素)以及只有文本的元素提取和创建唯一 ID。和元素不应分开<bold><a>

  <caption id="id1">Text 1</caption>
  <element1 id="id2">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
  <element2 id="id3">Notepad....</element2>

任何想法将不胜感激......

4

1 回答 1

0

我不太确定您是要保留层次结构还是要输出您描述的那些元素的平面列表;下面简单地将所描述的元素提取为一个平面列表(尽管保留了它们的内容),ids 由 XSLT 处理器生成:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*[not(*) and text()[normalize-space()]] | *[* and text()[normalize-space()]]">
  <xsl:copy>
    <xsl:attribute name="id" select="generate-id()"/>
    <xsl:apply-templates select="@* , node()" mode="copy"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="*" mode="copy">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()" mode="#current"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

当应用于您的输入样本时,Saxon 9 输出

<?xml version="1.0" encoding="UTF-8"?>
<caption id="d1e2">text 1</caption>
<element1 id="d1e4">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<element2 id="d1e13">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>
于 2013-06-06T09:33:46.980 回答