I'm using the mongo-hadoop adapter to run map/reduce jobs. everything is fine except the launch time and the time taken by the job. Even when the dataset is very small, the map time is 13 seconds and reduce time is 12 seconds. In fact I have changed settings in mapred-site.xml and core-site.xml. but the time taken for map/reduce seems to be constant. is there any way i can reduce it. I also explored the optimized hadoop distribution from hanborq. they use a worker pool for faster job launch/setup. is there any equivalent available elsewhere as the hanborq distribution is not very active. it was updated 4 months ago and is built on an older version of hadoop.
some of my settings are as follows: mapred-site.xml:
<property>
<name>mapred.child.java.opts</name>
<value>-Xms1g</value>
</property>
<property>
<name>mapred.sort.avoidance</name>
<value>true</value>
</property>
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>-1</value>
</property>
<property>
<name>mapreduce.tasktracker.outofband.heartbeat</name>
<value>true</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>false</value>
</property>
core-site.xml:
<property>
<name>io.sort.mb</name>
<value>300</value>
</property>
<property>
<name>io.sort.factor</name>
<value>100</value>
</property>
Any help would be greatly appreciated. thanks in advance.