0

在我的应用程序中,Java spark 上下文是使用不可用的主 URL 创建的(您可能会假设主 URL 已关闭以进行维护)。创建 Java spark 上下文时,它会导致停止运行 spark 驱动程序的 JVM,JVM 退出代码为 50。

当我检查日志时,我发现 SparkUncaughtExceptionHandler 调用了 System.exit。我的程序应该永远运行。我应该如何克服这个问题?

我在 spark 版本 1.4.1 和 1.6.0 中尝试过这个场景

我的代码如下

package test.mains;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;

public class CheckJavaSparkContext {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {

        SparkConf conf = new SparkConf();
        conf.setAppName("test");
        conf.setMaster("spark://sunshine:7077");

        try {
            new JavaSparkContext(conf);
        } catch (Throwable e) {
            System.out.println("Caught an exception : " + e.getMessage());
            //e.printStackTrace();
        }

        System.out.println("Waiting to complete...");
        while (true) {
        }
    }

}

部分输出日志

16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:107)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
    at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1730)
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called
16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd
Java Result: 50
4

1 回答 1

0

如果应用程序主服务器关闭,应用程序本身将尝试连接主服务器三次,超时时间为20 秒。看起来这些参数是硬编码的,不可配置。如果应用程序无法连接,您只能在重新启动后尝试重新提交您的应用程序。

这就是为什么您应该将集群配置为高可用性模式的原因。Spark Standalone 支持两种不同的模式:

第二个选项应该适用于生产并且在所描述的场景中有用。

于 2016-03-05T10:26:28.150 回答