0

我刚刚将我的 1.10.0 kubernetes 集群升级到 1.10.12。

我还将一个或两个节点更新到相同的版本。

但是,我现在看到:

kube-proxy-r5ts5                                0/1       CrashLoopBackOff   5          3m        134.79.129.110   gpu03

显示日志给出:

# kubectl -n kube-system logs -f kube-proxy-r5ts5
error: unrecognized key:

帮助?我不知道如何进一步解决这个问题。

巧合的是,我同时添加了一个新节点,看到weave启动也有问题:

# kubectl -n kube-system logs -f weave-net-mb299 weave
FATA: 2018/12/20 01:43:35.703088 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
Failed to get peers

# kubectl -n kube-system logs -f weave-net-mb299 weave-npc
ERROR: logging before flag.Parse: E1220 01:44:02.447197   28249 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:230: Failed to list *v1.NetworkPolicy: Get https://10.96.0.1:443/apis/networking.k8s.io/v1/networkpolicies?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

我想这是因为 kube-proxy 没有启动。

# kubectl -n kube-system describe pods kube-proxy-r5ts5
Name:           kube-proxy-r5ts5
Namespace:      kube-system
Node:           gpu02/134.79.129.96
Start Time:     Thu, 20 Dec 2018 02:01:10 +0000
Labels:         controller-revision-hash=3231443654
                k8s-app=kube-proxy
                pod-template-generation=4
Annotations:    <none>
Status:         Running
IP:             134.79.129.96
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  docker://1bcfca6db8f68d7130de86947343a24f9fc23b506ea295509933473f3d830845
    Image:         gcr.io/google_containers/kube-proxy-amd64:v1.10.12
    Image ID:      docker-pullable://gcr.io/google_containers/kube-proxy-amd64@sha256:a9ed73c3526033cd3cf732b4a84de9d211f425ef08cce4f0535617cadf0f4200
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/kube-proxy
      --config=/var/lib/kube-proxy/config.conf
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 20 Dec 2018 02:04:00 +0000
      Finished:     Thu, 20 Dec 2018 02:04:00 +0000
    Ready:          False
    Restart Count:  5
    Environment:    <none>
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-m4hvr (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  kube-proxy-token-m4hvr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-proxy-token-m4hvr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason                 Age              From                 Message
  ----     ------                 ----             ----                 -------
  Normal   SuccessfulMountVolume  3m               kubelet, gpu02  MountVolume.SetUp succeeded for volume "xtables-lock"
  Normal   SuccessfulMountVolume  3m               kubelet, gpu02  MountVolume.SetUp succeeded for volume "lib-modules"
  Normal   SuccessfulMountVolume  3m               kubelet, gpu02  MountVolume.SetUp succeeded for volume "kube-proxy"
  Normal   SuccessfulMountVolume  3m               kubelet, gpu02  MountVolume.SetUp succeeded for volume "kube-proxy-token-m4hvr"
  Normal   Started                2m (x4 over 3m)  kubelet, gpu02  Started container
  Warning  BackOff                2m (x7 over 3m)  kubelet, gpu02  Back-off restarting failed container
  Normal   Pulled                 2m (x5 over 3m)  kubelet, gpu02  Container image "gcr.io/google_containers/kube-proxy-amd64:v1.10.12" already present on machine
  Normal   Created                2m (x5 over 3m)  kubelet, gpu02  Created container

可能不相关,但我确实遇到了 cri-tools 的问题并kubeadm join说它找不到dockershim.sock. 所以我做了一个rpm -e --nodeps cri-tools,这似乎修复了加入。我很确定 docker 子系统正在工作,因为我可以看到机器上的其他 kubernetes pod(例如 k8s_POD_weave-net-mb299_kube-system、k8s_weave-npc_weave-net-mb299_kube-system)

来自其中一个奴才的日志快照:

Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.459850   10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637709   10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637826   10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637852   10526 kuberuntime_manager.go:646] createPodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637947   10526 pod_workers.go:186] Error syncing pod bd2287cb-0475-11e9-90de-fa163e21c438 ("hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"hub-85c95bbd57-bx4sr_jupyter-prod\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.661793   10526 container.go:507] Failed to update stats for container "/libcontainer_14802_systemd_test_default.slice": read /sys/fs/cgroup/cpu,cpuacct/libcontainer_14802_systemd_test_default.slice/cpuacct.usage: no such device, continuing to push stats
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745423   10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745492   10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745526   10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745640   10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.858313   10526 pod_container_deletor.go:77] Container "e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616" not found in pod's containers
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.934213   10526 pod_container_deletor.go:77] Container "de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e" not found in pod's containers
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696842   10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696892   10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697306   10526 container.go:393] Failed to create summary reader for "/libcontainer_14936_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697520   10526 container.go:393] Failed to create summary reader for "/libcontainer_14941_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708833   10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708860   10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.860952   10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861039   10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861067   10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861167   10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954796   10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954851   10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
4

0 回答 0