1

我有一个管道分隔的文本文件,如下所示,我需要使用 xsl 将其转换为格式良好的 xml 结构(如下所示的示例)。下面的 xsl 是我解决此问题的(最新)尝试 - 但是当逐行遍历文件时,我似乎无法找到将 002 级元素封装在 001 级中的方法,即保持父子关系。有人可以帮忙吗?

管道分隔文件 - 输入

001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG

XML 文件 - 所需的输出

<root>
   <level001>
            <elem name="field1">001</elem>
            <elem name="field2">XXX</elem>
            <elem name="field3">YYY</elem>
            <level002>
                           <elem name="field1">002</elem>
                           <elem name="field2">AAA</elem>
                           <elem name="field3">BBB</elem>
             </level002>
             <level002>
                        <elem name="field1">002</elem>
                        <elem name="field2">CCC</elem>
                        <elem name="field3">DD</elem>
              </level002>
    </level001>
    <level001>
                 <elem name="field1">001</elem>
                 <elem name="field2">XXX</elem>
                <elem name="field3">YYY</elem>
                <level002>
                         <elem name="field1">002</elem>
                         <elem name="field2">HHH</elem>
                         <elem name="field3">GG</elem>
               </level002>
    </level001>
</root>

当前 XSL

<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col> 
</xsl:variable>


 <xsl:template match="/" name="main">
<xsl:choose>
    <xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
       <xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
       <xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
       <root>
       <xsl:for-each select="$lines[position() &gt; 0]">
        <xsl:if test="translate(., '&#160; &#9;&#10;&#13;',  '') != ''">
            <level001>
            <xsl:variable name="line" select="." />
            <xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>    
            <xsl:choose>
                <xsl:when test="$columns[1]='001'">
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                </xsl:when>
                <xsl:when test="$columns[1]='002'">
                    <level002>
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                    </level002>
                </xsl:when>
            </xsl:choose>                               
            </level001>
        </xsl:if>
       </xsl:for-each>
       </root>
    </xsl:when>         
</xsl:choose>
4

3 回答 3

1

您可以在这里找到基本相同问题的解决方案:

http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml

核心是递归分组模板:

<xsl:template name="process-level">
  <xsl:param name="population" required="yes" as="element()*"/>
  <xsl:param name="level" required="yes" as="xs:integer"/>
  <xsl:for-each-group select="$population" 
       group-starting-with="*[xs:integer(@level) eq $level]">
    <xsl:element name="{@tag}">
      <xsl:copy-of select="@ID[string(.)], @REF[string(.)]"/>
      <xsl:value-of select="normalize-space(@text)"/>
      <xsl:call-template name="process-level">
        <xsl:with-param name="population" 
                        select="current-group()[position() != 1]"/>
        <xsl:with-param name="level" 
                        select="$level + 1"/>
      </xsl:call-template>
    </xsl:element>
  </xsl:for-each-group>
</xsl:template>
于 2012-09-04T09:43:04.830 回答
1

我将首先将平面文本转换为平面 XML 结构,然后将其与 分组for-each-group group-starting-with,如以下代码示例所示:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="mf xs"
  version="2.0">

<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>

<xsl:output indent="yes"/>

<xsl:function name="mf:group" as="node()*">
  <xsl:param name="nodes" as="node()*"/>
  <xsl:param name="level" as="xs:integer"/>
  <xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
    <xsl:element name="level{*[1]}">
      <xsl:copy-of select="*"/>
      <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
    </xsl:element>
  </xsl:for-each-group>
</xsl:function>

<xsl:template name="main">
  <xsl:variable name="flat">
    <xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
      <line>
        <xsl:for-each select="tokenize(., $sep)">
          <elem name="{$field}{position()}">
            <xsl:value-of select="."/>
          </elem>
        </xsl:for-each>
      </line>
    </xsl:for-each>
  </xsl:variable>
  <root>
    <xsl:sequence select="mf:group($flat/line, 1)"/>
  </root>
</xsl:template>

</xsl:stylesheet>

当我使用 Saxon 9 应用该样式表时java -jar saxon9he.jar -it:main -xsl:sheet.xsl,我得到的结果是

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">XXX</elem>
      <elem name="field3">YYY</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">AAA</elem>
         <elem name="field3">BBB</elem>
      </level002>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">CCC</elem>
         <elem name="field3">DD</elem>
      </level002>
   </level001>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">EEF</elem>
      <elem name="field3">XXX</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">HHH</elem>
         <elem name="field3">GGG</elem>
         <level/>
      </level002>
   </level001>
</root>

样式表有一个名为text-url纯文本文件的参数,您可以在运行样式表时设置。

于 2012-09-04T09:51:38.693 回答
0

好吧,您正在遍历每一行,并且level001在完成该行时已经关闭了标签。为什么不尝试类似(伪代码):

  • 对于每一行
  • 如果行是 level001
  • 打印<level001>
  • 获取下一级索引001
    • 对于此行和下一个 level001 行之间的每个 level002
    • 打印<level002>
    • level002的打印体
    • 打印</level002>
  • 打印</level001>
于 2012-09-04T09:19:30.857 回答