嘿,我对大数据世界还很陌生。我在 http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/上看到了这个教程
它详细描述了如何在本地和 Elastic Map Reduce 上使用 mrjob 运行 MapReduce 作业。
好吧,我正在尝试在我自己的 Hadoop 集群上运行它。我使用以下命令运行了这项工作。
python density.py tiny.dat -r hadoop --hadoop-bin /usr/bin/hadoop > outputmusic
这就是我得到的:
HADOOP: Running job: job_1369345811890_0245
HADOOP: Job job_1369345811890_0245 running in uber mode : false
HADOOP: map 0% reduce 0%
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Container killed by the ApplicationMaster.
HADOOP:
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: map 100% reduce 0%
HADOOP: Job job_1369345811890_0245 failed with state FAILED due to: Task failed task_1369345811890_0245_m_000001
HADOOP: Job failed as tasks failed. failedMaps:1 failedReduces:0
HADOOP:
HADOOP: Counters: 6
HADOOP: Job Counters
HADOOP: Failed map tasks=7
HADOOP: Launched map tasks=8
HADOOP: Other local map tasks=6
HADOOP: Data-local map tasks=2
HADOOP: Total time spent by all maps in occupied slots (ms)=32379
HADOOP: Total time spent by all reduces in occupied slots (ms)=0
HADOOP: Job not Successful!
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir=null
Traceback (most recent call last):
File "density.py", line 34, in <module>
MRDensity.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 344, in run
mr_job.run_job()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 381, in run_job
runner.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py", line 316, in run
self._run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 175, in _run
self._run_job_in_hadoop()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 325, in _run_job_in_hadoop
raise CalledProcessError(step_proc.returncode, streaming_args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/input', '-output', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output', '-cacheFile', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py', '-cacheArchive', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/mrjob.tar.gz#mrjob.tar.gz', '-mapper', 'python density.py --step-num=0 --mapper --protocol json --output-protocol json --input-protocol raw_value', '-jobconf', 'mapred.reduce.tasks=0']' returned non-zero exit status 1
注意:正如我在其他一些论坛中所建议的那样
#! /usr/bin/python
在我的 python 文件 density.py 和 track.py 的开头。它似乎对大多数人都有效,但我仍然继续得到上述异常。
编辑:我包含了原始 density.py 中使用的函数之一的定义,该函数在 density.py 本身的另一个文件 track.py 中定义。作业成功运行。但是,如果有人知道为什么会这样,那真的很有帮助。