When I checked the Hadoop GUI, I found that some of the reduce tasks have reached 66.66%, and they stay there for a long time. When I checked the counters, I found that the no. of input records is shown as zero.
After a long time, they get their input records, start processing them. Some show 0 input records in even for longer times and are killed by the Task Attempt failed to report status for 600 ms.
But some of the reducers show input records in their counters immediately and start processing them right away.
I do not know, why there is so much delay in the getting the input records for some reducers. This happens only with this program, and not the other programs.
In this mapreduce job, I have, in the configure method before the reduce method of the reduce, I read a lot of data from distributed cache. Is this the reason? I am not sure.