我有 1 个主节点和 5 个节点的 k8s 集群。我正在使用 ref 设置 EFK:https ://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes#step -4-%E2%80%94-creating-the-fluentd-daemonset
在创建 Fluentd DaemonSet 时,1 out 5 fluentd 处于 ImagePullBackOff 状态:
kubectl get all -n kube-logging -o wide Tue Apr 21 03:49:26 2020
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES
SELECTOR
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
NAME READY STATUS RESTARTS AGE IP NODE
po/fluentd-82h6k 1/1 Running 1 1d 100.96.15.56 ip-172-20-52-52.us-west-1.compute.internal
po/fluentd-8ghjq 0/1 ImagePullBackOff 0 17h 100.96.10.170 ip-172-20-58-72.us-west-1.compute.internal
po/fluentd-fdmc8 1/1 Running 1 1d 100.96.3.73 ip-172-20-63-147.us-west-1.compute.internal
po/fluentd-g7755 1/1 Running 1 1d 100.96.2.22 ip-172-20-60-101.us-west-1.compute.internal
po/fluentd-gj8q8 1/1 Running 1 1d 100.96.16.17 ip-172-20-57-232.us-west-1.compute.internal
admin@ip-172-20-58-79:~$ kubectl describe po/fluentd-8ghjq -n kube-logging
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 12m (x4364 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Back-off pulling image "fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1"
Warning FailedSync 2m (x4612 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Error syncing pod
Kubelet 登录无法运行 Fulentd 的节点 admin@ip-172-20-58-72:~$ journalctl -u kubelet -f
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095334 755 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095369 755 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: W0421 03:53:53.095440 755 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Apr 21 03:53:54 ip-172-20-58-72 kubelet[755]: I0421 03:53:54.882213 755 server.go:779] GET /metrics/cadvisor: (50.308555ms) 200 [[Prometheus/2.12.0] 172.20.58.79:54492]
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: I0421 03:53:55.452951 755 kuberuntime_manager.go:500] Container {Name:fluentd Image:fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:FLUENT_ELASTICSEARCH_HOST Value:vpc-cog-01-es-dtpgkfi.ap-southeast-1.es.amazonaws.com ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_PORT Value:443 ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_SCHEME Value:https ValueFrom:nil} {Name:FLUENTD_SYSTEMD_CONF Value:disable ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:varlog ReadOnly:false MountPath:/var/log SubPath: MountPropagation:<nil>} {Name:varlibdockercontainers ReadOnly:true MountPath:/var/lib/docker/containers SubPath: MountPropagation:<nil>} {Name:fluentd-token-k8fnp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: E0421 03:53:55.455327 755 pod_workers.go:182] Error syncing pod aa65dd30-82f2-11ea-a005-0607d7cb72ed ("fluentd-8ghjq_kube-logging(aa65dd30-82f2-11ea-a005-0607d7cb72ed)"), skipping: failed to "StartContainer" for "fluentd" with ImagePullBackOff: "Back-off pulling image \"fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1\""
Kubelet 登录运行 Fulentd 成功的节点 admin@ip-172-20-63-147:~$ journalctl -u kubelet -f
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874293 1272 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874336 1272 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: W0421 04:09:25.874453 1272 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available