我正在用纱线做一个 BFS 算法,我为我的顶点(顶点数据)上的数据创建了一个自定义值。但是,在我这样做之后,读取边缘的过程出现了问题。
我将错误追溯到以下代码行:
在 ByteArrayEdges 中,变量
serializedEdgesBytesUsed
获取值1987015248
并在分配新数组时给出 OutOfMemory 错误(据我所知,java 限制为 64K)@Override public void readFields(DataInput in) throws IOException { serializedEdgesBytesUsed = in.readInt(); if (serializedEdgesBytesUsed > 0) { // Only create a new buffer if the old one isn't big enough if (serializedEdges == null || serializedEdgesBytesUsed > serializedEdges.length) { serializedEdges = new byte[serializedEdgesBytesUsed]; } in.readFully(serializedEdges, 0, serializedEdgesBytesUsed); } edgeCount = in.readInt();
}
我不确定为什么会发生这种情况,但在使用自定义顶点数据之前,这个问题不存在。
完整的日志在这里(我直接从 Eclipse 进行测试,因为在伪分布式集群中要困难得多):
2015-08-20 01:52:21,103 INFO [LocalJobRunner Map Task Executor #0] utils.ProgressableUtils (ProgressableUtils.java:waitFor(315)) - waitFor: Future result not ready yet java.util.concurrent.FutureTask@b2dd686
2015-08-20 01:52:21,103 INFO [LocalJobRunner Map Task Executor #0] utils.ProgressableUtils (ProgressableUtils.java:waitFor(197)) - waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
2015-08-20 01:53:12,527 ERROR [LocalJobRunner Map Task Executor #0] graph.GraphMapper (GraphMapper.java:run(101)) - Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.giraph.edge.ByteArrayEdges.readFields(ByteArrayEdges.java:193)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:541)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
... 4 more
2015-08-20 01:53:12,532 ERROR [LocalJobRunner Map Task Executor #0] worker.BspServiceWorker (BspServiceWorker.java:unregisterHealth(777)) - unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_local1113753160_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/localhost_0 on superstep -1
2015-08-20 01:53:12,558 INFO [Thread-13] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2015-08-20 01:53:12,562 WARN [Thread-13] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1113753160_0001
java.lang.Exception: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:104)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@6e5efd25
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
... 8 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.giraph.edge.ByteArrayEdges.readFields(ByteArrayEdges.java:193)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:541)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
... 4 more
用于执行此操作的终端行是:
$HADOOP_HOME/bin/yarn jar $GIRAPH_HOME/gaph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar algoritmos.masivos.BusquedaDeCaminosNavegacionalesWikiquotesMasivo lectura_de_grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif pruebas.IdTextWithValueDoubleInputFormat -vip /user/hduser/input/wiki-graph-chiquito.txt -vof pruebas.IdTextWithValueTextOutputFormat -op /user/hduser/output/caminosNavegacionales -w 2 -yh 250
也许我应该使用一个EdgeInputFormat
?
谢谢阅读。