0

当尝试H2OContext.getOrCreate使用有效的调用时SparkContext,我们会随机看到部署失败:

17/04/21 17:21:32 ERROR TaskSchedulerImpl: Lost executor 0 on 172.17.0.4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/04/21 17:21:38 ERROR LiveListenerBus: Listener ExecutorAddNotSupportedListener threw an exception
java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud!
    at org.apache.spark.listeners.ExecutorAddNotSupportedListener.onExecutorAdded(H2OSparkListener.scala:27)
    at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:61)
    at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
    at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
    at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1252)
    at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77) 

H2OContext.getOrCreate导致错误的原因:

Context.spark_session = SparkSession.builder.getOrCreate()
Context.h2o_context = H2OContext.getOrCreate(Context.spark_session)

H2O Crew 有什么想法吗?

4

1 回答 1

0

这是目前 Sparkling Water 内部后端的已知行为。为避免这种情况,可以使用外部苏打水后端。可以在此处找到有关此的更多信息https://github.com/h2oai/sparkling-water/blob/master/doc/backends.md

我目前正在研究这个 JIRA,它也应该消除上述行为。它正在进行中,可以跟踪此 JIRA https://0xdata.atlassian.net/browse/SW-369以获取任务的状态。

于 2017-06-12T15:40:47.173 回答