我需要运行我的 mapper 和 reducer 函数,它们是使用 hadoop 流的两个可运行 jar。我已经编写了运行这些 jar 文件的 bash 脚本。我正在为我的 hadoop 流式传输使用以下命令
bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.1.jar \
-D stream.non.zero.exit.is.failure=false -file /home/abhinav/mapper.jar \
-file /home/abhinav/reducer.jar -mapper "./mapper.sh | sort -k1,1" \
-reducer "reducer.sh" -file /home/abhinav/mapper.sh -file /home/abhinav/reducer.sh \
-input /home/abhinav/stemp/dfs/name/input.txt -output /home/abhinav/stemp/dfs/name/op30
这是我的 mapper.sh 的样子
java -jar mapper.jar
我的任务日志中出现以下错误
Unable to access jarfile mapper.jar
我不明白为什么我的 mapper.jar 在集群上无法访问。
有人可以帮忙吗?
以下是日志
013-08-21 11:41:18,636 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-08-21 11:41:18,760 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/reducer.sh <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/reducer.sh
2013-08-21 11:41:18,770 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/job.jar <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/job.jar
2013-08-21 11:41:18,772 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/.job.jar.crc <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/.job.jar.crc
2013-08-21 11:41:18,776 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/lib <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/lib
2013-08-21 11:41:18,779 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/META-INF <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/META-INF
2013-08-21 11:41:18,783 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/org <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/org
2013-08-21 11:41:18,788 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/jars/mapper.sh <- /home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/mapper.sh
2013-08-21 11:41:19,050 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-08-21 11:41:19,218 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-08-21 11:41:19,226 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@136a1a1
2013-08-21 11:41:19,351 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2013-08-21 11:41:19,361 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-08-21 11:41:19,373 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-08-21 11:41:21,294 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2013-08-21 11:41:21,294 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2013-08-21 11:41:21,367 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/home/abhinav/stemp/mapred/local/taskTracker/abhinav/jobcache/job_201308211117_0008/attempt_201308211117_0008_m_000000_0/work/./mapper.sh]
2013-08-21 11:41:21,412 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2013-08-21 11:41:21,414 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2013-08-21 11:41:21,415 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(Unknown Source)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.DataOutputStream.flush(Unknown Source)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-08-21 11:41:21,416 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess exited with code 1 in org.apache.hadoop.streaming.PipeMapRed
2013-08-21 11:41:21,416 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2013-08-21 11:41:21,416 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-08-21 11:41:21,423 INFO org.apache.hadoop.mapred.Task: Task:attempt_201308211117_0008_m_000000_0 is done. And is in the process of commiting
2013-08-21 11:41:21,592 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201308211117_0008_m_000000_0' done.
2013-08-21 11:41:21,632 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-21 11:41:21,909 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-08-21 11:41:21,909 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName abhinav for UID 1000 from the native implementation