0

我在两个 redhat 6.4 linux 系统上配置的 hadoop 中运行 hadoop TestDFSIO 写入程序,但程序在之后挂起

100% 映射 16% 减少

我将 TestDFSIO 写入工作负载运行为

hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024 .

在格式化名称节点后,它在一次运行中运行良好,但在第二次运行时再次失败,因为在完成地图任务后挂起。

100% 映射 16% 减少。

格式化namenode后,它能够完成一次写入数据的运行

hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024

但是当我运行读取工作负载时

hadoop jar hadoop-test-1.2.1.jar -read -nrFiles 960 -fileSize 1024

它在结束后停留在最后阶段:-

100% map 16% reduce done.

为什么reduce任务无法正常完成?

主节点上 TaskTracker 的日志显示(时间和类名缩短):-

...0:15,541 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_226512462 exited with exit code 0. Number of tasks it ran: 1
...0:15,814 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 0.0% reading test_io_8@790197504/1073741824 ::host = 9.122.227.170
...0:16,768 INFO ....TaskTracker: Received KillTaskAction for task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....TaskTracker: About to purge task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....IndexCache: Map ID attempt_201309241959_0001_m_000957_1 not found in cache
...0:17,559 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) > 
...0:18,355 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 1.0% finished test_io_8 ::host = 9.122.227.170
...0:18,355 INFO ....TaskTracker: Task attempt_201309241959_0001_m_000958_0 is done.
...0:18,355 INFO ....TaskTracker: reported output size for attempt_201309241959_0001_m_000958_0  was 93
...0:18,356 INFO ....TaskTracker: addFreeSlot : current free slots : 2
...0:18,498 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_832308806 exited with exit code 0. Number of tasks it ran: 1
...0:20,584 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) > 
...0:21,697 INFO ....TaskTracker.clienttrace: src: 9.122.227.170:50060, dest: 9.122.227.170:48771, bytes: 93, op: MAPRED_SHUFFLE, cliID: attempt_201309241959_0001_m_000958_0, duration: 6041257
...0:26,608 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:32,632 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:35,655 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:41,679 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:47,700 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:50,721 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:56,744 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...0:59,766 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:05,789 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:11,812 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:14,835 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:20,859 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:26,885 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:29,908 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:35,931 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:41,955 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:44,978 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:51,002 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...1:57,025 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:00,048 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:06,072 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:12,096 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:15,119 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:21,143 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:27,167 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 
...2:30,190 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) > 

hadoop 进程卡在reduce作业中的终端屏幕截图。

屏幕截图显示 hadoop 进程卡在作业的 reduce 阶段。

4

1 回答 1

0

我解决了问题在于 DNS 名称解析的问题,我必须编辑 /etc/hosts 文件并删除 localhost 的条目并在其中添加实际的主机名,这在 Redhat Linux 6.4 中肯定对我有用。让 hadoop 集群识别 hadoop master 和 hadoop slave。

于 2013-10-11T09:49:12.053 回答