java - 600 秒内无法报告状态。杀戮！在 hadoop 中报告进度

Question

我收到以下错误：

Task attempt_201304161625_0028_m_000000_0 failed to report status for 600 seconds. Killing!

为我的地图工作。这个问题类似于这个、这个和这个。但是，我不想在 hadoop 杀死不报告进度的任务之前增加默认时间，即

Configuration conf=new Configuration();
long milliSeconds = 1000*60*60;
conf.setLong("mapred.task.timeout", milliSeconds);

context.progress()相反，我想使用context.setStatus("Some Message")或context.getCounter(SOME_ENUM.PROGRESS).increment(1)或类似的东西定期报告进度。但是，这仍然会导致作业被终止。这是我试图报告进度的代码片段。映射器：

protected void map(Key key, Value value, Context context) throws IOException, InterruptedException {

    //do some things
    Optimiser optimiser = new Optimiser();
    optimiser.optimiseFurther(<some parameters>, context);
    //more things
    context.write(newKey, newValue);
}

Optimiser 类中的 optimiseFurther 方法：

public void optimiseFurther(<Some parameters>, TaskAttemptContext context) {

    int count = 0;
    while(something is true) {
        //optimise

        //try to report progress
        context.setStatus("Progressing:" + count);
        System.out.println("Optimise Progress:" + context.getStatus());
        context.progress();
        count++;
    }
}

映射器的输出显示状态正在更新：

Optimise Progress:Progressing:0
Optimise Progress:Progressing:1
Optimise Progress:Progressing:2
...

但是，在默认时间后，该作业仍会被终止。我是否以错误的方式使用上下文？为了成功报告进度，我还需要在工作设置中做些什么吗？

score 7 · Accepted Answer

这个问题与Hadoop 0.20 中的一个错误有关，即调用context.setStatus()和context.progress()未报告给底层框架（设置各种计数器的调用也不起作用）。有一个可用的补丁，所以更新到新版本的 Hadoop 应该可以解决这个问题。

score 6 · Accepted Answer

可能发生的情况是，您必须在 Reporter 本身上调用那些在 Context 中找到的进度方法，并且可能无法在上下文本身上调用它。

来自 Cloudera

报告进度

如果您的任务在 10 分钟内没有报告进度（请参阅 mapred.task.timeout 属性），那么它将被 Hadoop 杀死。大多数任务不会遇到这种情况，因为它们通过读取输入和写入输出来隐式报告进度。但是，某些不以这种方式处理记录的作业可能会违反此行为并导致其任务被终止。模拟就是一个很好的例子，因为它们在每个映射中执行大量 CPU 密集型处理，并且通常只在计算结束时写入结果。它们的编写方式应定期报告进度（比每 10 分钟更频繁）。这可以通过多种方式实现：

Call setStatus() on Reporter to set a human-readable description of
the task’s progress
Call incrCounter() on Reporter to increment a user counter
Call progress() on Reporter to tell Hadoop that your task is 
still there (and making progress)

Cloudera 提示

public Context(Configuration conf, TaskAttemptID taskid,
               RecordReader<KEYIN,VALUEIN> reader,
               RecordWriter<KEYOUT,VALUEOUT> writer,
               OutputCommitter committer,
               StatusReporter reporter,
               InputSplit split)

java - 600 秒内无法报告状态。杀戮！在 hadoop 中报告进度

2 回答 2

Related

Reference