我有一个由 oozie 和 pig 脚本安排的 hadoop 作业,但问题是该作业始终处于挂起状态,我在 jobtracker/tasktracker 日志中看不到任何明确的错误/异常。
有没有人在如何确定根本原因方面有类似的经验?
这是来自作业跟踪器的日志,在任务跟踪器日志中没有找到有关此作业的任何内容:
2012-05-09 14:57:19,552 INFO org.apache.hadoop.mapred.JobQueuesManager: Job job_201205091453_0007 submitted to queue daily
2012-05-09 14:57:19,552 INFO org.apache.hadoop.mapred.JobTracker: Job job_201205091453_0007 added successfully for user 'mapred' to queue 'daily'
2012-05-09 14:57:19,552 INFO org.apache.hadoop.mapred.AuditLogger: USER=mapred IP=10.40.31.234 OPERATION=SUBMIT_JOB TARGET=job_201205091453_0007 RESULT=SUCCESS
2012-05-09 14:57:22,966 INFO org.apache.hadoop.mapred.JobInitializationPoller: Passing to Initializer Job Id :job_201205091453_0007 User: mapred Queue : daily
2012-05-09 14:57:24,086 INFO org.apache.hadoop.mapred.JobInitializationPoller: Initializing job : job_201205091453_0007 in Queue daily For user : mapred
2012-05-09 14:57:24,086 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201205091453_0007
2012-05-09 14:57:24,086 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201205091453_0007
2012-05-09 14:57:24,239 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /var/lib/hadoop-0.20/system/job_201205091453_0007/jobToken
2012-05-09 14:57:24,243 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201205091453_0007 = 48. Number of splits = 1
2012-05-09 14:57:24,243 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201205091453_0007_m_000000 has split on node:/default-rack/hzs-ubt-elou
2012-05-09 14:57:24,243 INFO org.apache.hadoop.mapred.JobInProgress: job_201205091453_0007 LOCALITY_WAIT_FACTOR=1.0
2012-05-09 14:57:24,243 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201205091453_0007 initialized successfully with 1 map tasks and 1 reduce tasks.
看下面的截图,问题是 map/reduce 任务处于挂起状态超过 21 小时。