H2O 苏打水经常抛出异常,每当发生这种情况时,我们都会手动重新运行它。问题是发生此异常时火花作业不退出,它们不返回退出状态,我们无法自动化此过程。
App > Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 316 in stage 22.0 failed 4 times, most recent failure: Lost task 316.3 in stage 22.0 (TID 9470, ip-**-***-***-**.ec2.internal): java.lang.ArrayIndexOutOfBoundsException: 65535
App > at water.DKV.get(DKV.java:202)
App > at water.DKV.get(DKV.java:175)
App > at water.Key.get(Key.java:83)
App > at water.fvec.Frame.createNewChunks(Frame.java:896)
App > at water.fvec.FrameUtils$class.createNewChunks(FrameUtils.scala:43)
App > at water.fvec.FrameUtils$.createNewChunks(FrameUtils.scala:70)
App > at org.apache.spark.h2o.backends.internal.InternalWriteConverterContext.createChunks(InternalWriteConverterContext.scala:28)
App > at org.apache.spark.h2o.converters.SparkDataFrameConverter$class.org$apache$spark$h2o$converters$SparkDataFrameConverter$$perSQLPartition(SparkDataFrameConverter.scala:86)
App > at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun$toH2OFrame$1$$anonfun$apply$2.apply(SparkDataFrameConverter.scala:67)
App > at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun$toH2OFrame$1$$anonfun$apply$2.apply(SparkDataFrameConverter.scala:67)
App > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
App > at org.apache.spark.scheduler.Task.run(Task.scala:85)
App > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
App > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
App > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)