We've set up a Spring Batch batch email-sender job that does the following:
- Read incoming email details from a JMS Queue.
- Don't do any processing, just pass them through...
- Turn the details into emails and "write" them to SMTP
- Repeat ad infinitum
This generally works fine, but we've discovered a massive memory leak. After 9 hours or so, can see a single FlowJob
, but 748 JobExecution
(all 'STARTED'), each holding 778 (!) StepExecution
instances. All in all, 900 MB of stuff.
Here's the Spring config (Spring 3.1, Spring Batch 2.1.9):
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor">
<bean class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
</property>
</bean>
<bean id="jmsEmailFetcher" class="org.springframework.batch.item.jms.JmsItemReader">
<property name="jmsTemplate" ref="batchEmailJmsTemplate" />
</bean>
<bean id="passthroughProcessor" class="org.springframework.batch.item.support.PassThroughItemProcessor" />
<bean id="transactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<!-- The Spring Batch *Limiter/Decider* -->
<bean id="ourLimitDecider" class="our.special.InfiniteThrottledExecutionDecider" />
<!-- The Spring Batch *Job* -->
<batch:job id="fetchEmailsJobBatch" restartable="true">
<batch:step id="fetchEmailsStep" next="limitDecision">
<batch:tasklet throttle-limit="10">
<batch:chunk reader="jmsEmailFetcher" processor="passthroughProcessor"
writer="batchEmailService.javaMailItemWriter" commit-interval="100" skip-limit="999999999"> <!-- Might gets *lots* of MQ-not-up-yet fails -->
<batch:skippable-exception-classes>
<batch:include class="org.springframework.jms.JmsException" /> <!-- Generally MQ-not-up-yet ConnectionException, or Session-closed (in tests) -->
<batch:include class="java.lang.IllegalStateException" /> <!-- Yuk, usually/presumably a test SMTP server isn't yet connected -->
<batch:include class="org.springframework.mail.MailSendException" /> <!-- SMTP down... -->
</batch:skippable-exception-classes>
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:decision id="limitDecision" decider="ourLimitDecider">
<batch:next on="CONTINUE" to="fetchEmailsStep" />
<batch:end on="COMPLETED" />
</batch:decision>
</batch:job>
Our InfiniteThrottledExecutionDecider
basically returns new FlowExecutionStatus("CONTINUE")
every time, to ensure that the fetchEmailsStep
gets executed at the end of the flow, and that the Job never completes - at least not until we're ready to stop it ourselves.
We're not using a database, so we expect some stuff to be held in memory, but not a complete record of everything that's ever been run...
Is there something wrong with our configuration? Or our approach?
Here's a bit from our log file for what we're told is Step #778 and what should be the one and only Job instance.
23:58:18,782 - INFO (org.springframework.batch.core.job.SimpleStepHandler) - Duplicate step [fetchEmailsStep] detected in execution of job=[fetchEmailsJobBatch]. If either step fails, both will be executed again on restart.
23:59:52,257 - INFO (org.springframework.batch.core.job.SimpleStepHandler) - Executing step: [fetchEmailsStep]
23:59:52,257 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Executing: id=778
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context.
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1
23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Preparing chunk execution for StepContext: org.springframework.batch.core.scope.context.StepContext@1be1ee
23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Chunk execution starting: queue size=0
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context.
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1
... 5 second JMS timeout ...
23:59:57,716 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value.
23:59:57,716 - DEBUG (org.springframework.batch.core.step.item.ChunkOrientedTasklet) - Inputs not busy, ended: true
23:59:57,716 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Applying contribution: [StepContribution: read=0, written=0, filtered=0, readSkips=0, writeSkips=0, processSkips=0, exitStatus=EXECUTING]
23:59:57,719 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Saving step execution before commit: StepExecution: id=778, version=1, name=fetchEmailsStep, status=STARTED, exitStatus=EXECUTING, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0, exitDescription=
23:59:57,721 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value.
23:59:57,721 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution success: id=778
23:59:57,722 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution complete: StepExecution: id=778, version=3, name=fetchEmailsStep, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.fetchEmailsStep with status=COMPLETED
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.limitDecision100
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.limitDecision100 with status=CONTINUE
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.fetchEmailsStep
Problem is, the heap dump shows 748 JobExecution
s, and 581,911 StepExecution
s. Where are they all coming from?