我刚刚将我的 1.10.0 kubernetes 集群升级到 1.10.12。
我还将一个或两个节点更新到相同的版本。
但是,我现在看到:
kube-proxy-r5ts5 0/1 CrashLoopBackOff 5 3m 134.79.129.110 gpu03
显示日志给出:
# kubectl -n kube-system logs -f kube-proxy-r5ts5
error: unrecognized key:
帮助?我不知道如何进一步解决这个问题。
巧合的是,我同时添加了一个新节点,看到weave启动也有问题:
# kubectl -n kube-system logs -f weave-net-mb299 weave
FATA: 2018/12/20 01:43:35.703088 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
Failed to get peers
# kubectl -n kube-system logs -f weave-net-mb299 weave-npc
ERROR: logging before flag.Parse: E1220 01:44:02.447197 28249 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:230: Failed to list *v1.NetworkPolicy: Get https://10.96.0.1:443/apis/networking.k8s.io/v1/networkpolicies?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
我想这是因为 kube-proxy 没有启动。
# kubectl -n kube-system describe pods kube-proxy-r5ts5
Name: kube-proxy-r5ts5
Namespace: kube-system
Node: gpu02/134.79.129.96
Start Time: Thu, 20 Dec 2018 02:01:10 +0000
Labels: controller-revision-hash=3231443654
k8s-app=kube-proxy
pod-template-generation=4
Annotations: <none>
Status: Running
IP: 134.79.129.96
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: docker://1bcfca6db8f68d7130de86947343a24f9fc23b506ea295509933473f3d830845
Image: gcr.io/google_containers/kube-proxy-amd64:v1.10.12
Image ID: docker-pullable://gcr.io/google_containers/kube-proxy-amd64@sha256:a9ed73c3526033cd3cf732b4a84de9d211f425ef08cce4f0535617cadf0f4200
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 20 Dec 2018 02:04:00 +0000
Finished: Thu, 20 Dec 2018 02:04:00 +0000
Ready: False
Restart Count: 5
Environment: <none>
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-m4hvr (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-proxy-token-m4hvr:
Type: Secret (a volume populated by a Secret)
SecretName: kube-proxy-token-m4hvr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "xtables-lock"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy-token-m4hvr"
Normal Started 2m (x4 over 3m) kubelet, gpu02 Started container
Warning BackOff 2m (x7 over 3m) kubelet, gpu02 Back-off restarting failed container
Normal Pulled 2m (x5 over 3m) kubelet, gpu02 Container image "gcr.io/google_containers/kube-proxy-amd64:v1.10.12" already present on machine
Normal Created 2m (x5 over 3m) kubelet, gpu02 Created container
可能不相关,但我确实遇到了 cri-tools 的问题并kubeadm join
说它找不到dockershim.sock
. 所以我做了一个rpm -e --nodeps cri-tools
,这似乎修复了加入。我很确定 docker 子系统正在工作,因为我可以看到机器上的其他 kubernetes pod(例如 k8s_POD_weave-net-mb299_kube-system、k8s_weave-npc_weave-net-mb299_kube-system)
来自其中一个奴才的日志快照:
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.459850 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637709 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637826 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637852 10526 kuberuntime_manager.go:646] createPodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637947 10526 pod_workers.go:186] Error syncing pod bd2287cb-0475-11e9-90de-fa163e21c438 ("hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"hub-85c95bbd57-bx4sr_jupyter-prod\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.661793 10526 container.go:507] Failed to update stats for container "/libcontainer_14802_systemd_test_default.slice": read /sys/fs/cgroup/cpu,cpuacct/libcontainer_14802_systemd_test_default.slice/cpuacct.usage: no such device, continuing to push stats
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745423 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745492 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745526 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745640 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.858313 10526 pod_container_deletor.go:77] Container "e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616" not found in pod's containers
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.934213 10526 pod_container_deletor.go:77] Container "de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e" not found in pod's containers
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696842 10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696892 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697306 10526 container.go:393] Failed to create summary reader for "/libcontainer_14936_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697520 10526 container.go:393] Failed to create summary reader for "/libcontainer_14941_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708833 10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708860 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.860952 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861039 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861067 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861167 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954796 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954851 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused