我在大约 1 GB 的数据上运行一个猪脚本,其中涉及几个 groupby 和 foreach 语句。这是示例猪代码:
ab = 组 y BY(y1, y2, y3, y4, y5, y6) ;
xy = FOREACH ab {
abc = FOREACH y
生成
x1、x2、x3、x4、x5、x6、rel1、rel2;
生成
组,abc;
} ;
注意:rel1 和 rel2是按组生成的,它们本身也是 bag说 GC Overhead limit exceeded 。
纱线原木
2018-08-08 15:01:13,299 INFO [Service Thread] org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 1148190720(1121280K) used = 5726479864(5592265K) committed = 5726797824(5592576K) max = 5726797824(5592576K), toFree = 3046581752
2018-08-08 15:04:22,192 FATAL [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162,5,main] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-08-08 15:05:24,112 INFO [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException