1

I want to sort the mapper output records by the first 2 fields before feeding them to reducer, and here is how I did it:

hadoop streaming \-D mapred.job.name="multi_field_key_sort"\
-D mapred.job.map.capacity=100\
-D mapred.reduce.tasks=1\
-D stream.num.map.output.key.fields=2\
-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator\
-D mapred.text.key.comparator.options="-k1,2n"\
-input "..."\
-output "..."\
-mapper "..."\
-reducer "cat"\

but the final results are not sorted by the first 2 fields, they are only sorted by the 1st fields, why? Anything wrong with my hadoop job conf?

4

0 回答 0