hadoop - hadoop streaming jobs fails to report?

Question

All jobs were running successfully using hadoop-streaming, but all of a sudden I started to see errors due to one of worker machines

Hadoop job_201110302152_0002 failures on master

Attempt Task    Machine State   Error   Logs
attempt_201110302152_0002_m_000037_0    task_201110302152_0002_m_000037 worker2 FAILED  
Task attempt_201110302152_0002_m_000037_0 failed to report status for 622 seconds. Killing!
-------
Task attempt_201110302152_0002_m_000037_0 failed to report status for 601 seconds. Killing!
Last 4KB
Last 8KB
All

Questions :

- Why does this happening ?  
- How can I handle such issues?

Thank you

score 1 · Accepted Answer

默认为 600 秒的mapred.task.timeout的描述说“如果任务既不读取输入，也不写入输出，也不更新其状态字符串，则任务将被终止前的毫秒数。”

增加mapred.task.timeout的值可能会解决问题，但是你需要弄清楚map任务是否真的需要超过600s才能完成对输入数据的处理，或者代码中是否存在需要修复的bug调试。

根据 Hadoop 最佳实践，平均而言，地图任务需要一分钟左右的时间来处理 InputSplit。

hadoop - hadoop streaming jobs fails to report?

1 回答 1

Related

Reference