0

我有一个作业池,我从中检索作业并启动它们。模式类似于:

    Job job = JobPool.getJob();
    job.waitForCompletion();
    JobPool.release(job);

当我尝试重用作业对象时遇到问题,因为它甚至没有运行(很可能是因为它的状态是:已完成)。因此,在下面的代码片段中,第二个waitForCompletion调用会打印作业的统计信息/计数器,并且不执行任何其他操作。

    Job jobX = JobPool.getJob();
    jobX.waitForCompletion();
    JobPool.release(jobX);

    //.......

    Job jobX = JobPool.getJob();
    jobX.waitForCompletion(); // <--- here the job should run, but it doesn't 

当我说作业实际上没有运行时,我是对的吗,因为 hadoop 将其状态视为已完成并且它没有尝试运行它?如果是,您知道如何重置作业对象以便我可以重用它吗?

4

1 回答 1

1

The Javadoc includes this hint that the jobs should only run once

The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.

I think there's some confusion about the job, and the view of the job. The latter is the thing that you have got, and it is designed to map to at most one job running in hadoop. The view of the job is fundamentally light weight, and if creating that object is expensive relative to actually running the job... well, I've got to believe that your jobs are simple enough that you don't need hadoop.

Using the view to submit a job is potentially expensive (copying jars into the cluster, initializing the job in the JobTracker, and so on); conceptually, the idea of telling the jobtracker to "rerun " or "copy ; run ", makes sense. As far as I can tell, there's no support for either of those ideas in practice. I suspect that hadoop isn't actually guaranteeing retention policies that would support either use case.

于 2012-10-15T22:07:23.547 回答