1

注意:此错误是在 spark 执行组件之前引发的。

记录
工作节点 1:

17/05/18 23:12:52 INFO Worker: Successfully registered with master spark://spark-master-1.com:7077  
17/05/18 23:58:41 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM

主节点:

17/05/18 23:12:52 INFO Master: Registering worker spark-worker-1com:56056 with 2 cores, 14.5 GB RAM
17/05/18 23:14:20 INFO Master: Registering worker spark-worker-2.com:53986 with 2 cores, 14.5 GB RAM
17/05/18 23:59:42 WARN Master: Removing spark-worker-1com-56056 because we got no heartbeat in 60 seconds
17/05/18 23:59:42 INFO Master: Removing spark-worker-2.com:56056
17/05/19 00:00:03 ERROR Master: RECEIVED SIGNAL 15: SIGTERM

工作节点2:

17/05/18 23:14:20 INFO Worker: Successfully registered with master spark://spark-master-node-2.com:7077
17/05/18 23:59:40 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
4

1 回答 1

1

TL;DR我认为有人明确调用了killcommand 或sbin/stop-worker.sh.

"RECEIVED SIGNAL 15: SIGTERM" 由关闭挂钩报告,以记录类 UNIX 系统上的TERM, HUP,信号:INT

  /** Register a signal handler to log signals on UNIX-like systems. */
  def registerLogger(log: Logger): Unit = synchronized {
    if (!loggerRegistered) {
      Seq("TERM", "HUP", "INT").foreach { sig =>
        SignalUtils.register(sig) {
          log.error("RECEIVED SIGNAL " + sig)
          false
        }
      }
      loggerRegistered = true
    }
  }

在您的情况下,这意味着该进程收到SIGTERM以自行停止:

SIGTERM 信号是用于导致程序终止的通用信号。与 SIGKILL 不同,此信号可以被阻止、处理和忽略。这是礼貌地要求程序终止的正常方式。

这就是当您执行KILL或使用shell 脚本时发送的内容./sbin/stop-master.sh,这些./sbin/stop-worker.sh脚本反过来调用命令会杀死 master 或 worker 的 JVM 进程sbin/spark-daemon.shstop

kill "$TARGET_ID" && rm -f "$pid"
于 2017-05-24T13:14:22.277 回答