1

I have very large input document (thousands of Records) that has a structure something like this (Data represents many child elements):

<Input>
  <Record id="1">
    <Data/>
  </Record>
  <Record id="2">
    <Data/>
  </Record>
  <Record id="3">
    <Data/>
  </Record>
  <Record id="4">
    <Data/>
  </Record>
  <Record id="5">
    <Data/>
  </Record>
  <Record id="6">
    <!-- This is bad data -->
    <BadData/>
  </Record>
  <Record id="7">
    <Data/>
  </Record>
  <Record id="8">
    <Data/>
  </Record>
  <Record id="9">
    <!-- Also bad data -->
    <BadData/>
  </Record>
</Input>

I'm processing it with a stylesheet that performs a complex transform on each Record which could run into many dynamic errors. In this application if a few records have bad data I would prefer not to halt the transform but I would like to know about the errors so I can fix them later. I'm using an xsl:try/xsl:catch to allow the processing to continue:

<xsl:stylesheet
  version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:err="http://www.w3.org/2005/xqt-errors"
  exclude-result-prefixes="xs err">

  <xsl:output indent="yes"/>

  <xsl:strip-space elements="*"/>

  <xsl:template match="Input">
    <Output>
      <xsl:apply-templates/>
    </Output>
  </xsl:template>

  <xsl:template match="Record">
    <xsl:variable name="preprocessed" as="element(GoodData)?">
      <xsl:try>
        <xsl:apply-templates mode="preprocess" select="."/>
        <xsl:catch>
          <xsl:message expand-text="yes">Couldn't create good data for {@id} Code: {$err:code} {$err:description}</xsl:message>
        </xsl:catch>
      </xsl:try>
    </xsl:variable>
    <!-- Do some more logic on the preprocessed record -->
    <xsl:if test="$preprocessed">
      <NewRecord id="{@id}">
        <xsl:sequence select="$preprocessed"/>
      </NewRecord>
    </xsl:if>
  </xsl:template>



  <xsl:template mode="preprocess" match="Record">
    <!-- This represents a very complex transform with many potential dynamic errors -->
    <xsl:variable name="source" as="element(Data)" select="*"/>
    <xsl:if test="$source">
      <GoodData/>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

This works fine, but it's a pain to dig through the large input documents to find the few records that failed. What I'd like to do is write the source of the Record elements that fail to a new Input document using xsl:result-document. I'm trying to add an xsl:accumulator something like this:

<xsl:accumulator name="failed-source" initial-value="()" as="element(Record)*">
  <xsl:accumulator-rule match="Record" phase="end">
    <xsl:sequence select="$value, .[false()(:test for failure:)]"/>
  </xsl:accumulator-rule>
</xsl:accumulator>

<xsl:template match="Input">
  <Output>
    <xsl:apply-templates/>
  </Output>
  <xsl:if test="accumulator-after('failed-source')">
    <xsl:result-document href="failed.input.xml">
      <Input>
        <xsl:sequence select="accumulator-after('failed-source')"/>
      </Input>
    </xsl:result-document>
  </xsl:if>
</xsl:template>

However, I can't figure out what the predicate in the xsl:accumulator-rule should be, or if it's even possible to use this pattern. Can a single result document be created without restructuring the stylesheet?

NB: I'm aware of the following solution, but it wasn't my first choice because it seems like it could potentially have much higher memory requirements, but perhaps that isn't true. I could also write all the Records out to individual files but I consider this dangerous because one source document might generate thousands of failures.

<xsl:template match="Input">
  <xsl:variable name="processed" as="document-node()">
    <xsl:document>
      <xsl:apply-templates/>
    </xsl:document>
  </xsl:variable>
  <xsl:if test="$processed/NewRecord">
    <Output>
      <xsl:sequence select="$processed/NewRecord"/>
    </Output>
  </xsl:if>
  <xsl:if test="$processed/Record">
    <xsl:result-document href="failed.input.xml">
      <Input>
        <xsl:sequence select="$processed/Record"/>
      </Input>
    </xsl:result-document>
  </xsl:if>
</xsl:template>

<xsl:template match="Record">
  <xsl:variable name="preprocessed" as="element(GoodData)?">
    <xsl:try>
      <xsl:apply-templates mode="preprocess" select="."/>
      <xsl:catch>
        <xsl:message expand-text="yes">Couldn't create good data for {@id} Code: {$err:code} {$err:description}</xsl:message>
      </xsl:catch>
    </xsl:try>
  </xsl:variable>
  <!-- Do some more logic on the preprocessed record -->
  <xsl:choose>
    <xsl:when test="$preprocessed">
      <NewRecord id="{@id}">
        <xsl:sequence select="$preprocessed"/>
      </NewRecord>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="."/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
4

1 回答 1

2

这是一个有趣的方法。

累加器的值必须始终是输入节点的纯函数。无法从其他活动中获取信息,例如节点的处理是否失败。我不清楚您是否可以独立于对这些记录执行的处理来检测“不良记录”:如果可以,也就是说,如果您实际上是在对输入进行自定义验证,那么这种模式可能会很有效出色地。(但在那种情况下,我认为您不会尝试/捕获。相反,您的主要处理功能将首先检查累加器以查看数据是否有效。)

请注意,累加器的规范允许计算一个累加器来访问其他累加器,但这目前在 Saxon 中没有实现。

我认为解决这个问题的更常用的方法可能是将成功处理的结果和不成功处理的报告写入同一个结果树,然后在后续的转换过程中将其拆分。不幸的是,XSLT 3.0 的流功能在多通道处理领域没有提供任何东西。但是,对于拆分过程,xsl:fork 可能很合适。

于 2015-01-26T23:33:00.933 回答