2

我收到 1/4 pod 的 CrashLoopBackOff 错误,请指导我如何解决此问题。

$kubectl 获取 pod -n cog-prod01 -o wide

slotmachine-1688723297-5vlht          1/1       Running            0          21h       100.96.6.15     ip-172-21-61-42.compute.internal
slotmachine-1688723297-6plr9          1/1       Running            0          16h       100.96.13.16    ip-172-21-54-247.compute.internal
slotmachine-1688723297-k995t          1/1       Running            0          16h       100.96.11.186   ip-172-21-60-180.compute.internal
slotmachine-1688723297-sk8bn          0/1       CrashLoopBackOff   8          19m       100.96.2.72     ip-172-21-56-148.compute.internal

Kubelet 登录节点:

admin@ip-172-21-56-148:~$ journalctl -u kubelet -f

    Jan 07 02:44:36 ip-172-21-56-148 kubelet[1568]: W0107 02:44:36.351880    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: W0107 02:44:46.372270    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443776    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443851    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.592800    1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerStarted", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
    Jan 07 02:44:56 ip-172-21-56-148 kubelet[1568]: W0107 02:44:56.409374    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.669027    1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerDied", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971547    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3aa.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971640    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971770    1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: E0107 02:45:00.971805    1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:06 ip-172-21-56-148 kubelet[1568]: W0107 02:45:06.447068    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.149685    1568 status_manager.go:418] Status for pod "2bc8665e-30f5-11ea-a92d-024aeca0bafc" is up-to-date; skipping
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.443951    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b35a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444070    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444198    1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: E0107 02:45:12.444238    1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:13 ip-172-21-56-148 kubelet[1568]: I0107 02:45:13.938976    1568 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
    Jan 07 02:45:16 ip-172-21-56-148 kubelet[1568]: W0107 02:45:16.464693    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available

admin@ip-172-21-43-86:~$ kubectl describe po -n cog-prod01 slotmachine-1688723297-sk8bn

Events:
  FirstSeen     LastSeen        Count   From                                                            SubObjectPath                   Type            Reason                  Message
  ---------     --------        -----   ----                                                            -------------                   --------        ------                  -------
  27m           27m             1       default-scheduler                                                                               Normal          Scheduled               Successfully assigned slotmachine-1688723297-sk8bn to ip-172-21-56-148.compute.internal
  27m           27m             1       kubelet, ip-172-21-56-148.compute.internal                                       Normal          SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "slotmachine-logs"
  27m           27m             1       kubelet, ip-172-21-56-148.compute.internal                                       Normal          SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-9bxjf"
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Pulled                  Container image "gt/slotmachine:develop.6590.xxxx.2866" already present on machine
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Created                 Created container
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Started                 Started container
  27m           11s             113     kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Warning         BackOff                 Back-off restarting failed container
  27m           11s             113     kubelet, ip-172-21-56-148.compute.internal                                       Warning         FailedSync              Error syncing pod

注意:检查运行该 pod 的节点上的磁盘空间、CPU、内存都很好。根据 pod 日志,它无法连接配置服务,但其他 3 个能够连接到该服务,因此无法弄清楚这里出了什么问题!

admin@ip-172-21-43-86:~$ kubectl logs -n  cog-prod01 slotmachine-1688723297-sk8bn


03:01:02.104 [main] INFO  org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Fetching config from server at: http://configservice:8888
03:01:05.344 [main] WARN  org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Could not locate PropertySource: I/O error on GET request for "http://configservice:8888/slotmachine/cog,cog-prod01": No route to host (Host unreachable); nested exception is java.net.NoRouteToHostException: No route to host (Host unreachable)
03:01:05.381 [main] INFO  org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext - Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@77eca502: startup date [Tue Jan 07 03:01:05 UTC 2020]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@4fb0f2b9
4

2 回答 2

2

一个或多个节点上没有足够的可用容量,因此调度程序无法部署您的第 4 个 pod。您可以使用kubectl describe nodes. 有关详细说明,请查看我对GKE Insufficient CPU for small Node.js app pods的回答

于 2020-01-07T04:22:04.287 回答
2

检查 Kube Proxy 是否在您的节点上正常工作。

这是调试 Kube Proxy的指南

于 2020-01-07T04:35:57.457 回答