我对 MapReduce 框架感到很困惑。我从不同的来源阅读有关此内容感到困惑。顺便说一句,这是我对 MapReduce Job 的想法
1. Map()-->emit <key,value>
2. Partitioner (OPTIONAL) --> divide
intermediate output from mapper and assign them to different
reducers
3. Shuffle phase used to make: <key,listofvalues>
4. Combiner, component used like a minireducer wich perform some
operations on datas and then pass those data to the reducer.
Combiner is on local not HDFS, saving space and time.
5. Reducer, get the data from the combiner, perform further
operation(probably the same as the combiner) then release the
output.
6. We will have n outputs parts, where n is the number
of reducers
基本上是对的?我的意思是,我发现一些消息来源说组合器是洗牌阶段,它基本上按每条记录分组...