1

Hadoop is running on a cluster of 8 nodes. The submitted job produces several key-value objects as mapper output with different keys (manually checked), so I except to have several launched reducers to manage the data in the nodes.

I don't know why, as the log report, the number of launched reduce tasks is always 1. Since there are tens different keys I expect to have at least as many reducers as the number of nodes, i.e. 8 (which is also the number of slaves).

This is the log when job ends

13/05/25 04:02:31 INFO mapred.JobClient: Job complete: job_201305242051_0051
13/05/25 04:02:31 INFO mapred.JobClient: Counters: 30
13/05/25 04:02:31 INFO mapred.JobClient:   Job Counters 
13/05/25 04:02:31 INFO mapred.JobClient:     Launched reduce tasks=1
13/05/25 04:02:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21415994
13/05/25 04:02:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/05/25 04:02:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/05/25 04:02:31 INFO mapred.JobClient:     Rack-local map tasks=7
13/05/25 04:02:31 INFO mapred.JobClient:     Launched map tasks=33
13/05/25 04:02:31 INFO mapred.JobClient:     Data-local map tasks=26
13/05/25 04:02:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5486645
13/05/25 04:02:31 INFO mapred.JobClient:   File Output Format Counters 
13/05/25 04:02:31 INFO mapred.JobClient:     Bytes Written=2798
13/05/25 04:02:31 INFO mapred.JobClient:   FileSystemCounters
13/05/25 04:02:31 INFO mapred.JobClient:     FILE_BYTES_READ=2299685944
13/05/25 04:02:31 INFO mapred.JobClient:     HDFS_BYTES_READ=2170126861
13/05/25 04:02:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2879025663
13/05/25 04:02:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2798
13/05/25 04:02:31 INFO mapred.JobClient:   File Input Format Counters 
13/05/25 04:02:31 INFO mapred.JobClient:     Bytes Read=2170123000

Other (useful?) information:

  • for each node I have 1 core assigned to the job
  • I manually checked that the job is effectively running on 8 nodes.
  • There is no parameter set by me for setting the reducers tasks fixed to one
  • Hadoop version: 1.1.2

So, do you have any idea of why the reducer number is 1? and not more?

Thanks

4

2 回答 2

1

你应该:

  1. 首先检查您的集群是否支持超过 1 个减速器
  2. 指定要运行的 reduce 成员

签出支持的减速器计数

最方便的检查方法是使用jobtrackerwebUI:( http://localhost:50030/machines.jsp?type=active您可能需要localhost使用正在运行的 jobtracker 的主机名删除。它将显示集群中所有活动的 TaskTracker,以及每个 TaskTracker 可以同时运行的减速器数量。

指定减速机编号

有三种方法供您使用:

在代码中指定减速器编号

就像 zsxwing 显示的那样,您应该通过setNumReduceTasks()调用JobConf. 并给出reduce数作为参数。

在命令行中指定减速器编号

您还可以在命令行中传递减速器编号,如下所示 bin/hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.reduce.tasks=2 teragen teragen_out

上面的命令行会启动 2 个 reducer。

在您的conf/mapred-site.xml

您还可以在 mapred-site.xml 中添加一个新属性,如下所示:

  <property>
    <name>mapred.reduce.tasks</name>
    <value>2</value>
  </property>
于 2013-06-04T09:37:39.517 回答
0

无论mapper输出多少个key,都需要自己设置reducer个数(默认为1)。您可以使用job.setNumReduceTasks(5)设置减少任务 5。

于 2013-06-04T09:00:08.993 回答