我最近问了这个问题,但意识到我没有解释得很清楚。我有一个由发票组成的大型 .csv 文件(8000 多行),每张发票有多行。我将其解析为如下所示的 XML 结构(简化)。
输入 1 - $XMLInput
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</roow>
输入 2 - $maxBatchSize 描述:在它变得大于这个大小(常量)后中断到下一个批次
输入 3 - $listOfInvoices 描述:文档中唯一发票编号的重复变量。例子:
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
</root>
为了提高性能时间,我需要按 invoiceNumber 将这些元素分组,每个批次不超过 X 个节点(要导入的变量)。从那里我将每个批次并行发送到一个子处理器,而不是一次处理整个原始文档。例如,在上面的示例 XML 文档中,如果批量大小不能大于 3,我将需要以下 XML 输出:
输出 1 - $XMLOutput
<root>
<batch>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
</batch>
<batch>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</batch>
</root>
要求发票的所有行在同一批次中发送。我最初的 XSLT 尝试低于 (2.0),我尝试模拟一个 while 循环,通过递归调用模板将发票组附加到当前节点。当达到最大批处理大小时,我递归调用批处理模板来创建一个新批处理。我在每个递归调用之间传递发票和批次计数器。
编辑:感谢肯的帮助,我越来越近了。我确实需要每次按行数而不是不同发票的数量来分解发票。理论上,如果以下内容有效,我不确定如何确保发票编号不存在于前一个兄弟节点中。
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:variable name="batch-size" select="40" as="xs:integer"/>
<xsl:variable name="input" select="bpws:getVariableData('sortedInvoicesByBU')"/>
<xsl:key name="invoice-lines-by-invoice-number" match="row" use="invoiceNumber4z"/>
<xsl:template match="/">
<xsl:element name="batches">
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group select="$input/*:UPSData/*:row" group-by="(position() - 1) idiv $batch-size">
<xsl:for-each select="distinct-values($input/*:UPSData/*:row/*:invoiceNumber4z)[not(.=preceding-sibling::item)]">
<xsl:element name="UPSData">
<xsl:for-each select="current()">
<xsl:for-each select="key('invoice-lines-by-invoice-number',.,$input)">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:for-each-group>
</xsl:element>
</xsl:template>
</xsl:stylesheet>