我从执行者那里获取日志(从底部开始):
2021-11-30 21:44:42
2021-11-30 18:44:42,911 INFO [shutdown-hook-0] util.ShutdownHookManager (Logging.scala:logInfo(57)) - Deleting directory /var/data/spark-0646270c-a2d0-47d4-8e6c-0bc735bc255d/spark-a54cf7e4-baaf-4411-9073-0c1fb1e4cc5b
2021-11-30 21:44:42
2021-11-30 18:44:42,910 INFO [shutdown-hook-0] util.ShutdownHookManager (Logging.scala:logInfo(57)) - Shutdown hook called
2021-11-30 21:44:42
2021-11-30 18:44:42,902 ERROR [SIGTERM handler] executor.CoarseGrainedExecutorBackend (SignalUtils.scala:$anonfun$registerLogger$2(43)) - RECEIVED SIGNAL TERM
2021-11-30 21:44:42
2021-11-30 18:44:42,823 INFO [CoarseGrainedExecutorBackend-stop-executor] storage.BlockManager (Logging.scala:logInfo(57)) - BlockManager stopped
2021-11-30 21:44:42
2021-11-30 18:44:42,822 INFO [CoarseGrainedExecutorBackend-stop-executor] memory.MemoryStore (Logging.scala:logInfo(57)) - MemoryStore cleared
2021-11-30 21:44:42
2021-11-30 18:44:42,798 INFO [dispatcher-Executor] executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(57)) - Driver commanded a shutdown
如何在 Spark 驱动程序中启用任何类型的日志记录,以了解驱动程序上的哪种事件触发了执行程序关闭?Driver 或 Executor 不缺内存,Pod 指标显示它们占用的内存远远超过限制 + 开销。因此,看起来关闭信号的原因并不是缺乏资源,而是可能是一些隐藏的异常,没有记录在任何地方。
根据@mazaneicha 的建议,我尝试设置更长的超时时间,但仍然出现相同的错误
implicit val spark: SparkSession = SparkSession
.builder
.master("local[1]")
.config(new SparkConf().setIfMissing("spark.master", "local[1]")
.set("spark.eventLog.dir", "file:///tmp/spark-events")
.set("spark.dynamicAllocation.executorIdleTimeout", "100s") //spark.dynamicAllocation.executorIdleTimeout
.set("spark.dynamicAllocation.schedulerBacklogTimeout", "100s") //spark.dynamicAllocation.schedulerBacklogTimeout
)
.getOrCreate()