hadoop - Hadoop: Map Reduce

Question

I've some questions on mapreduce for HDFS.

Questions:

What is the default numbers of workers (map/reduce task) in each node of HDFS.
Which is better approach to access multiple blocks parallelly in a node and why.

a. Creating multiple tasktrackers. b. Creating multiple workers (map/reduce task) per tasktracker.

Thanks In advance

score 1 · Accepted Answer

1)

By default: The number of slots on a node is equal to the number of Cores (CPU Cores). One task (either it is a Map task or a Reduce task) is executed on one slot.

2)

I think the second approach is better - having multiple tasks run on a task tracker. Because, from performance point of view, what really matters is not the count of tasks but the distance between the Task Node (ie task tracker) and the Data Node. So, if all tasks are being run on the closest TaskTracker, then the performance will be better (not sure if hadoop can reuse the network connection. If yes, it will be an added advantage). In case, if multiple task trackers are created, some of them could be far away from the Data Node. So, the performance will obviously be low.

hadoop - Hadoop: Map Reduce

1 回答 1

1)

2)

Related

Reference