3

By default if a mapper/reducer fails, hadoop tries to run other instance of it and if it fails 4 times(default value) hadoop marks complete MR job as failed.

I am processing some raw data and i am ok if MR job fails to process 30% of data. is there any configuration by which I can set if 30% of mappers fail don't kill the job and give output of remaining 70% of data. I can handle exceptions in my code and maintain failed and success records in counter but i want to know is there any such config in hadoop

4

1 回答 1

4

谢谢!我从权威指南中得到了答案。

对于某些应用程序,如果一些任务失败,则不希望中止作业,因为尽管有一些失败,但仍有可能使用作业的结果。在这种情况下,可以为作业设置允许在不触发作业失败的情况下失败的任务的最大百分比。使用 mapreduce.map.failures.maxpercent 和 mapreduce.max.reduce.failures.percent 属性独立控制 Map 任务和 reduce 任务。

于 2013-07-04T20:24:14.667 回答