0

All jobs were running successfully using hadoop-streaming, but all of a sudden I started to see errors due to one of worker machines

Hadoop job_201110302152_0002 failures on master

Attempt Task    Machine State   Error   Logs
attempt_201110302152_0002_m_000037_0    task_201110302152_0002_m_000037 worker2 FAILED  
Task attempt_201110302152_0002_m_000037_0 failed to report status for 622 seconds. Killing!
-------
Task attempt_201110302152_0002_m_000037_0 failed to report status for 601 seconds. Killing!
Last 4KB
Last 8KB
All

Questions :

- Why does this happening ?  
- How can I handle such issues?  

Thank you

4

1 回答 1

1

默认为 600 秒的mapred.task.timeout的描述说“如果任务既不读取输入,也不写入输出,也不更新其状态字符串,则任务将被终止前的毫秒数。”

增加mapred.task.timeout的值可能会解决问题,但是你需要弄清楚map任务是否真的需要超过600s才能完成对输入数据的处理,或者代码中是否存在需要修复的bug调试。

根据 Hadoop 最佳实践,平均而言,地图任务需要一分钟左右的时间来处理 InputSplit。

于 2011-10-31T12:19:37.063 回答