1

The original Map-Reduce execution chain is: InputSplits-->Mapper--> [Sorting/Shuffling, etc]-->Reducer-->...

Now I don't want the input splits to get to the Mappers first, but to go to some other new stage instead (we can call it Pre-Mapper for example, this class will be created by myself).

So the new order will be: InputSplits -> Pre-Mapper->Mapper ->...

I'm currently reading the source code. However, I still cannot find any clue (what classes I should touch).

Any suggestion is welcome. Thank you very much :)

4

3 回答 3

2

Maybe you should take a look at chaining mappers: ChainMapper

于 2013-10-16T08:10:40.793 回答
1

you can implement it use two stages MapReduce:

stage one: Mapper -> Sorting/Shuffling -> Reducer[this reducer do nothing but write the data directly form Mapper];

stage two: Mapreduce;

this stage one here is what you do in your Pre Mapper;

于 2013-10-16T08:15:09.603 回答
0

You can consider overriding the MapRunner class

于 2014-04-29T09:24:34.653 回答