Spark 应用程序以独立集群模式部署,并启用了监督。
在高可用性测试期间,当带有驱动程序实例的机架(不正常)断电时,spark master 不知道被杀死的驱动程序和应用程序,并且 master 持续为应用程序启动执行程序大约 15 分钟。
masterlogs(下面记录了 15 分钟)
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Launching executor app-20180309175053-0002/5202 on worker worker-20180309171520-10.247.247.191-51426
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Removing executor app-20180309175053-0002/5153 because it is EXITED
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Launching executor app-20180309175053-0002/5203 on worker worker-20180309171632-10.247.247.156-57784
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Removing executor app-20180309175053-0002/5155 because it is EXITED
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Launching executor app-20180309175053-0002/5204 on worker worker-20180309123802-10.247.247.121-45652
2018-03-09 18:09:02 INFO org.apache.spark.internal.Logging$class:54 - Removing executor app-20180309175053-0002/5157 because it is EXITED
第 15 分钟后
2018-03-09 18:09:16 WARN org.apache.spark.internal.Logging$class:66 - Got status update for unknown executor app-20180309175053-0002/5282
2018-03-09 18:09:16 WARN org.apache.spark.internal.Logging$class:66 - Got status update for unknown executor app-20180309175053-0002/5295
2018-03-09 18:09:16 WARN org.apache.spark.internal.Logging$class:66 - Got status update for unknown executor app-20180309175053-0002/5296
2018-03-09 18:09:16 WARN org.apache.spark.internal.Logging$class:66 - Got status update for unknown executor app-20180309175053-0002/5289
2018-03-09 18:09:16 WARN org.apache.spark.internal.Logging$class:66 - Got status update for unknown executor app-20180309175053-0002/5277
执行者日志
2018-03-09 18:50:17 INFO org.apache.spark.internal.Logging$class:54 - Asked to kill executor app-20180309180931-0004/50
2018-03-09 18:50:17 INFO org.apache.spark.internal.Logging$class:54 - Runner thread for executor app-20180309180931-0004/50 interrupted
2018-03-09 18:50:17 INFO org.apache.spark.internal.Logging$class:54 - Killing process!
2018-03-09 18:50:17 INFO org.apache.spark.internal.Logging$class:54 - Executor app-20180309180931-0004/50 finished with state KILLED exitStatus 143
我检查了火花主代码,在那里找不到太多东西。
任何帮助表示赞赏,谢谢。