我在两个 redhat 6.4 linux 系统上配置的 hadoop 中运行 hadoop TestDFSIO 写入程序,但程序在之后挂起
100% 映射 16% 减少
我将 TestDFSIO 写入工作负载运行为
hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024 .
在格式化名称节点后,它在一次运行中运行良好,但在第二次运行时再次失败,因为在完成地图任务后挂起。
100% 映射 16% 减少。
格式化namenode后,它能够完成一次写入数据的运行
hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024
但是当我运行读取工作负载时
hadoop jar hadoop-test-1.2.1.jar -read -nrFiles 960 -fileSize 1024
它在结束后停留在最后阶段:-
100% map 16% reduce done.
为什么reduce任务无法正常完成?
主节点上 TaskTracker 的日志显示(时间和类名缩短):-
...0:15,541 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_226512462 exited with exit code 0. Number of tasks it ran: 1
...0:15,814 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 0.0% reading test_io_8@790197504/1073741824 ::host = 9.122.227.170
...0:16,768 INFO ....TaskTracker: Received KillTaskAction for task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....TaskTracker: About to purge task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....IndexCache: Map ID attempt_201309241959_0001_m_000957_1 not found in cache
...0:17,559 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) >
...0:18,355 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 1.0% finished test_io_8 ::host = 9.122.227.170
...0:18,355 INFO ....TaskTracker: Task attempt_201309241959_0001_m_000958_0 is done.
...0:18,355 INFO ....TaskTracker: reported output size for attempt_201309241959_0001_m_000958_0 was 93
...0:18,356 INFO ....TaskTracker: addFreeSlot : current free slots : 2
...0:18,498 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_832308806 exited with exit code 0. Number of tasks it ran: 1
...0:20,584 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) >
...0:21,697 INFO ....TaskTracker.clienttrace: src: 9.122.227.170:50060, dest: 9.122.227.170:48771, bytes: 93, op: MAPRED_SHUFFLE, cliID: attempt_201309241959_0001_m_000958_0, duration: 6041257
...0:26,608 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:32,632 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:35,655 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:41,679 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:47,700 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:50,721 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:56,744 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:59,766 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:05,789 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:11,812 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:14,835 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:20,859 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:26,885 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:29,908 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:35,931 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:41,955 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:44,978 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:51,002 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:57,025 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:00,048 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:06,072 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:12,096 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:15,119 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:21,143 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:27,167 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:30,190 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
屏幕截图显示 hadoop 进程卡在作业的 reduce 阶段。