我遇到了一个奇怪的问题。我有一个 mapreduce 类,它在文件中查找模式(模式文件进入 DistributedCache)。现在我想重用这个类来运行 1000 个模式文件。我只需要扩展模式匹配类并覆盖它的 main 和 run 函数。在子类的运行中,我修改了命令行参数并将它们提供给父类的 run() 函数。一切顺利,直到迭代 45-50。突然,所有的任务跟踪器都开始失败,直到没有任何进展。我检查了 HDFS,但还剩下 70% 的空间。有没有人知道为什么推出 50 个工作,一个一个地给 hadoop 带来困难?
@Override
public int run(String[] args) throws Exception {
//-patterns patternsDIR input/ output/
List<String> files = getFiles(args[1]);
String inputDataset=args[2];
String outputDir=args[3];
for (int i=0; i<files.size(); i++){
String [] newArgs= new String[4];
newArgs = modifyArgs(args);
super.run(newArgs);
}
return 0;
}
编辑:刚刚检查了作业日志,这是发生的第一个错误:
2013-11-12 09:03:01,665 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:03:32,971 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000053_0' has completed task_201311120807_0053_m_000053 successfully.
2013-11-12 09:07:51,717 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:08:05,973 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000128_0' has completed task_201311120807_0053_m_000128 successfully.
2013-11-12 09:08:16,571 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000130_0' has completed task_201311120807_0053_m_000130 successfully.
2013-11-12 09:08:16,571 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_1595161181_30] for 30 seconds. Will retry shortly ...
2013-11-12 09:08:27,175 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000138_0' has completed task_201311120807_0053_m_000138 successfully.
2013-11-12 09:08:25,241 ERROR org.mortbay.log: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:08:25,241 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@7fcb9c0a, false, false, true, 9834) from 10.1.1.13:55028: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:62)
at java.lang.StringBuilder.<init>(StringBuilder.java:97)
at org.apache.hadoop.util.StringUtils.escapeString(StringUtils.java:435)
at org.apache.hadoop.mapred.Counters.escape(Counters.java:768)
at org.apache.hadoop.mapred.Counters.access$000(Counters.java:52)
at org.apache.hadoop.mapred.Counters$Counter.makeEscapedCompactString(Counters.java:111)
at org.apache.hadoop.mapred.Counters$Group.makeEscapedCompactString(Counters.java:221)
at org.apache.hadoop.mapred.Counters.makeEscapedCompactString(Counters.java:648)
at org.apache.hadoop.mapred.JobHistory$MapAttempt.logFinished(JobHistory.java:2276)
at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2636)
at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1222)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
2013-11-12 09:08:16,571 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3269c671, false, false, true, 9841) from 10.1.1.23:42125: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2875)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3806)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:290)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:294)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:253)
at java.io.PrintWriter.flush(PrintWriter.java:293)
at java.io.PrintWriter.checkError(PrintWriter.java:330)
at org.apache.hadoop.mapred.JobHistory.log(JobHistory.java:847)
at org.apache.hadoop.mapred.JobHistory$MapAttempt.logStarted(JobHistory.java:2225)
at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2632)
at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1222)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
之后我们看到一堆:
2013-11-12 09:13:48,204 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201311120807_0053_m_000033_0: Lost task tracker: tracker_n144-06b.wall1.ilabt.iminds.be:localhost/127.0.0.1:47567
EDIT2:一些想法?
- 堆空间错误有点出乎意料,因为映射器几乎不需要任何内存。
- 我用 super.run() 调用基类,我应该为此使用 Toolrunner 调用吗?
- 在每次迭代中,大约 1000 个单词 + 分数的文件被添加到 DistributedCache,我不确定是否应该在某处重置缓存?(super.run() 中的每个作业都使用 job.waitForCompletion() 运行,那么缓存是否被清除?)
编辑3:
@Donald:我没有调整hadoop守护进程的内存大小,所以它们每个应该有1GB的堆。maptasks 有 800 MB 的堆,其中 450 MB 用于 io.sort。
@Chris:我没有修改柜台上的任何东西,我正在使用普通的。有 1764 个地图任务,每个任务有 16 个计数器,而工作本身将有另外 20 个左右。这可能确实在 50 个连续作业之后加起来,但是如果您正在运行多个连续作业,我认为它不会存储在堆中?
@额外的信息:
- 地图任务非常快,每个任务只需要 3-5 秒,我有 jvm.reuse=-1。一个 map 任务处理一个有 10 条记录的文件(文件远小于块大小)。由于文件较小,我可以考虑制作包含 100 条记录的输入文件以减少映射开销。
- 我尝试的第一件事是添加一个单位缩减器(1 个缩减任务)以减少在 HDFS 中创建的文件数量(否则每个模式将有 1 个,因此每个作业有 1000 个,这可能会为数据节点产生开销)
- 每个作业的记录数相当低,我正在寻找 1764 个文件中的特定单词,并且与 1000 个模式之一的匹配数总共约为 5000 个地图输出记录)
@All:谢谢你们帮助我!