我有大量的报告加载到块分区步骤中。每个报告将被进一步处理以生成单独的报告。但是,如果我在分区步骤中加载 50k 的报告,这会使服务器过载并且速度会变得很慢。而不是我更喜欢,分区步骤加载 3k 的报告列表,处理它,然后在分区步骤上加载另一个 3k 报告.. 继续相同,直到 50k 报告得到处理。
<step id="genReport" next="fileTransfer">
<chunk item-count="1000">
<reader ref="Reader" >
</reader>
<writer
ref="Writer" >
</writer>
</chunk>
<partition>
<mapper ref="Mapper">
<properties >
<property name="threadCount" value="#{jobProperties['threadCount']}"/>
<property name="threadNumber" value="#{partitionPlan['threadNumber']}"/>
</properties>
</mapper>
</partition>
</step>
public PartitionPlan mapPartitions() {
PartitionPlanImpl partitionPlan = new PartitionPlanImpl();
int numberOfPartitions = //dao call to load the reports count
partitionPlan.setThreads(getThreadCount());
partitionPlan.setPartitions(numberOfPartitions); //This numberOfPartitions is comes from the database, huge size like 20k to 40k
Properties[] props = new Properties[numberOfPartitions];
for (int idx = 0; idx < numberOfPartitions; idx++) {
Properties threadProperties = new Properties();
threadProperties.setProperty("threadNumber", idx + "");
GAHReportListData gahRptListData = gahReportListManager.getPageToProcess(); //Data pulled from PriorityBlockingQueue
String dynSqlId = gahRptListData.getDynSqlId();
threadProperties.setProperty("sqlId", dynSqlId);
threadProperties.setProperty("outFile", fileName);
props[idx] = threadProperties;
}
partitionPlan.setPartitionProperties(props);
return partitionPlan;
}
一旦分区映射器处理了 3k 个数据报告,则它必须检查下一个可用列表。如果可用,则应使用下一组要处理的 3k 报告重置分区。