Is it must that we have to set number of reducers to use custom partitioner ? Example : Word Count problem, want to get all the stop words count in one partition and remaining words count to go to different partition. If I set number of reducers to two and stop words to go to one partition and others to go to the next partition, it will work, but I am restricting the number of reducers to two(or N ), which I don't want. What is the best approach here? Or I have to calculate and set the number of reducers based on the size of the input to get the best performance?


1 回答 1



int getPartition(KEY key, VALUE value, int numPartitions) 

如果您不设置分区器,HashPartitioner则使用 。它的实现很简单:

public int getPartition(K key, V value, int numReduceTasks) {
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;



这始终是您必须做的,并且与自定义分区器的使用无关。您必须设置 reducer 的数量,默认值为 1,Hadoop 不会为您计算此值。


public int getPartition(K key, V value, int numReduceTasks) {
    if (isStopWord(key) {
        return 0;
    } else {
        return ((key.hashCode() & Integer.MAX_VALUE) % (numReduceTasks - 1)) + 1;


可能是XY 问题。我不确定您要问的是解决实际问题的最佳方法。

于 2014-09-08T20:25:55.033 回答