5

I am newbie on Hadoop. I remember I learned from somewhere that in Hadoop, all map functions have to be completed before reduce functions can start off.

But I just got the printout when I run a map reduce program like this:

map(15%), reduce(5%)
map(20%), reduce(7%)
map(30%), reduce(10%)
map(38%), reduce(17%)
map(40%), reduce(25%)

why they run in parallel?

4

1 回答 1

5

在真正的 Reduce 阶段开始之前,ShuffleSortMerge会随着 Mappers 的不断完成而发生。这个百分比表明了这一点。这不是实际的 Reduce 阶段。这是并行发生的,以减少如果框架继续等待所有映射器的完成,然后进行洗牌、排序和合并,否则会产生的开销。

于 2013-09-13T19:43:45.323 回答