背景
在Configure Liveness、Readiness 和 Startup Probes文档中,您可以找到以下信息:
kubelet 用于liveness probes
知道何时重新启动容器。例如,活跃度探针可以捕获死锁,即应用程序正在运行,但无法取得进展。尽管存在错误,但在这种状态下重新启动容器有助于使应用程序更可用。
kubelet 用于readiness probes
了解容器何时准备好开始接受流量。当 Pod 的所有容器都准备好时,就认为 Pod 准备好了。此信号的一种用途是控制哪些 Pod 用作服务的后端。当 Pod 未准备好时,它会从服务负载均衡器中移除。
由于GKE
master 由 google 管理,您将找不到kubelet
使用的日志CLI
(您可能会尝试使用Stackdriver
)。我已经在Kubeadm
集群上对其进行了测试并将verbosity
级别设置为8
.
当您使用时,$ kubectl get events
您只能从最后一小时获得事件(它可以在 Kubernetes 设置中更改 -Kubeadm
但我认为它不能更改,GKE
因为 master 由谷歌管理。)
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
37m Normal Starting node/kubeadm Starting kubelet.
...
33m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to kubeadm
33m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
33m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness" in 893.953679ms
33m Normal Created pod/liveness-http Created container liveness
33m Normal Started pod/liveness-http Started container liveness
3m12s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
30m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
8m17s Warning BackOff pod/liveness-http Back-off restarting failed container
之后再次执行相同的命令~1 hour
。
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
33s Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
5m40s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
15m Warning BackOff pod/liveness-http Back-off restarting failed container
测试
检查每 10 秒执行一次,Readiness Probe
持续时间超过一小时。
Mar 09 14:48:34 kubeadm kubelet[3855]: I0309 14:48:34.222085 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:44 kubeadm kubelet[3855]: I0309 14:48:44.221782 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:54 kubeadm kubelet[3855]: I0309 14:48:54.221828 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:01:34 kubeadm kubelet[3855]: I0309 15:01:34.222491 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4
562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:44 kubeadm kubelet[3855]: I0309 15:01:44.221877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:54 kubeadm kubelet[3855]: I0309 15:01:54.221976 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:10:14 kubeadm kubelet[3855]: I0309 15:10:14.222163 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:24 kubeadm kubelet[3855]: I0309 15:10:24.221744 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:34 kubeadm kubelet[3855]: I0309 15:10:34.223877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 16:04:14 kubeadm kubelet[3855]: I0309 16:04:14.222853 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:04:24 kubeadm kubelet[3855]: I0309 16:04:24.222531 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
此外,还有Liveness probe
条目。
Mar 09 16:12:58 kubeadm kubelet[3855]: I0309 16:12:58.462878 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:13:58 kubeadm kubelet[3855]: I0309 16:13:58.462906 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465587 3855 kuberuntime_manager.go:712] Killing unwanted container "liveness"(id={"docker" "95567f85708ffac8b34b6c6f2bdb49d8eb57e7704b7b416083c7f296dd40cd0b"}) for pod "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a)"
测试总时间:
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
liveness-http 0/1 Running 21 99m
liveness-http 0/1 CrashLoopBackOff 21 101m
liveness-http 0/1 Running 22 106m
liveness-http 1/1 Running 22 106m
liveness-http 0/1 Running 22 106m
liveness-http 0/1 Running 23 109m
liveness-http 1/1 Running 23 109m
liveness-http 0/1 Running 23 109m
liveness-http 0/1 CrashLoopBackOff 23 112m
liveness-http 0/1 Running 24 117m
liveness-http 1/1 Running 24 117m
liveness-http 0/1 Running 24 117m
结论
不再调用活性探针检查
Liveness check
在 Kubernetes 创建 pod 时创建,并在每次重新启动时重新创建Pod
。在您的配置中,您已经设置了,initialDelaySeconds: 20
因此在创建Kubernetes 后pod
,Kubernetes 将等待 20 秒,然后它将调用liveness
probe 3 次(如您设置的那样failureThreshold: 3
)。3 失败后,Kubernetes 会根据RestartPolicy
. 同样在日志中,您将能够在日志中找到:
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
为什么会重启?答案可以在Container probes中找到。
livenessProbe:
指示容器是否正在运行。如果 liveness 探测失败,kubelet 会杀死容器,容器会受到其重启策略的约束。
中的默认重启策略是。因此,您的 pod 将一遍又一遍地重新启动。GKE
Always
已调用就绪探测检查,但间隔变长(最大间隔看起来约为 10 分钟)
我认为你已经得出了这个结论,因为你已经基于$ kubectl get events
和$ kubectl describe po
。在这两种情况下,默认事件都会在 1 小时后删除。在我的Tests
部分中,您可以看到Readiness probe
条目来自14:48:34
until 16:04:24
,因此每 10 秒 Kubernetes 调用一次Readiness Probe
。
为什么 Liveness 探针没有运行,而 Readiness 探针的间隔发生了变化?
正如我在该Tests
部分中向您展示的那样,Readiness probe
没有改变。在这种情况下误导是使用$ kubectl events
. 关于Liveiness Probe
它仍在调用,但在 pod 之后只有 3 次created
/ restarted
。我还包括了$ kubectl get po -w
. 重新pod
创建时,您可能会在 kubelet 日志中找到那些liveness probes
.
在我的计划中,我将创建警报策略,其条件如下:
如果liveness probe
失败 3 次,使用您当前的设置,它将重新启动此 pod。在这种情况下,您可以使用 eachrestart
创建一个alert
.
Metric: kubernetes.io/container/restart_count
Resource type: k8s_container
您可以在 Stackoverflow 案例中找到一些有用的信息,Monitoring alert
例如: