kubernetes - Kubernetes - 主节点中的 kube-system pod 在工作节点加入后不断重启

Question

我已按照本教程和本教程以及本教程进行操作，但过去 3 天我面临同样的问题。

我可以通过以下步骤正确设置主节点：

kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

export kubever=$(kubectl version | base64 | tr -d ‘\’)
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

一切似乎都很好

kubectl get all --namespace=kube-system

然后，

在工作节点上：

kubeadm join --token 864655.fdf6d0b389867b79 192.168.100.17:6443 --discovery-token-ca-cert-hash sha256:a2d840808b17b53b9612e6271ccde489f13dbede7d354f97188d0faa9e210af2

输出看起来不错，如下所示：

[preflight] Running pre-flight checks.
  [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.100.17:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.100.17:6443"
[discovery] Requesting info from "https://192.168.100.17:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.100.17:6443"
[discovery] Successfully established connection with API Server "192.168.100.17:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

但是一旦我运行这个命令，所有的地狱都会崩溃。这

kubectl get all --namespace=kube-system

开始显示所有 pod 都在一直在重新启动。状态在 Pending 和 Running 之间不断变化，有时一些 pod 甚至会消失并且可能具有 ContainerCreating 状态等。

NAME                                READY     STATUS    RESTARTS   AGE
po/etcd-ubuntu                      0/1       Pending   0          0s
po/kube-controller-manager-ubuntu   0/1       Pending   0          0s
po/kube-dns-6f4fd4bdf-cmcfk         3/3       Running   0          13m
po/kube-proxy-2chb6                 1/1       Running   0          13m
po/kube-scheduler-ubuntu            0/1       Pending   0          0s
po/weave-net-ptdxr                  2/2       Running   0          11m

我还尝试了第二个教程，使用法兰绒，并得到了完全相同的问题。

我的设置

我创建了两个新的虚拟机，在 VMware 上全新安装了 Ubuntu 17.10，每个虚拟机有 2 个处理器/2 核 6 GB 内存和 50 GB 硬盘。我的物理机是 i7-6700k，内存为 32gb。我在它们上面都安装了 kubeadm、kubelet 和 docker，然后按照上面提到的步骤进行操作。

我还尝试在 VMware 上的 NAT 和 Bridge 之间切换，但没有任何改变。

两个具有桥接网络的虚拟机的初始 IP 分别为 192.168.100.12 和 192.168.100.17。对于hostname -I大师：

192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2

对于hostname -I工作节点：

192.168.100.12 172.17.0.1 10.44.0.0 10.32.0.1

journalctl -xeu kubelet显示以下内容：

https://gist.github.com/saad749/9a771a3460bf88c274498b5bc4b7fd84

在尝试使用法兰绒（仍然是同样的问题）时，结果来自

kubectl describe nodes

是

https://gist.github.com/saad749/d24c453c8b4e663e9abf572a0fb38bf4

我错过了 kubeadm init 之前的任何步骤吗？我应该更改 IP 地址（更改为什么）？我应该查看任何特定的日志吗？有没有更全面的教程？所有问题都在工作节点上的 kubeadm 加入后开始，我可以在主节点或任何其他东西上部署 kubernetes，它工作正常。

更新：

即使应用了errordeveloper的建议，同样的问题仍然存在。

我将以下标志添加到 kubeadm init：

--apiserver-advertise-address 192.168.100.17

我将 kubeadm.conf 更新为以下内容并重新加载并重新启动： https ://gist.github.com/saad749/c7149c87ec3e75a40586f626cf04279a

并尝试更改集群 dns https://gist.github.com/saad749/5fa66bebc22841e58119333e75600e40

这是初始化主服务器后的日志：

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      1/1       Running   0          22s       192.168.100.17   ubuntu
kube-system   kube-apiserver-ubuntu            1/1       Running   0          29s       192.168.100.17   ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          13s       192.168.100.17   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         3/3       Running   0          1m        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          1m        192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running   0          34s       192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   0          32s       192.168.100.17   ubuntu

主机名 -i 结果：

kube-master@ubuntu:~$ hostname -I
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2 10.32.0.3 10.32.0.4 10.32.0.5 10.32.0.6 10.244.0.0 10.244.0.1
kube-master@ubuntu:~$ hostname -i
192.168.100.17

由于。。。导致的结果：

kubectl describe nodes

https://gist.github.com/saad749/8f460650182a04d0ddf3158a52761a9a

内部 IP 现在似乎是正确的。

从第二个节点加入后，会发生这种情况：

kube-master@ubuntu:~$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
ubuntu    Ready     master    49m       v1.9.3
kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE       IP               NODE
kube-system   kube-controller-manager-ubuntu   0/1       Pending             0          0s        <none>           ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         0/3       ContainerCreating   0          49m       <none>           ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running             0          49m       192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running             0          1s        192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running             0          48m       192.168.100.17   ubuntu

ifconfig -a 结果：

https://gist.github.com/saad749/63a5a52bd3246ff72477b2aca7d158d0

journalctl -xeu kubelet 结果

https://gist.github.com/saad749/8a60870b35f93df8565e66cb208aff32

有时，Pod 的 IP 显示为 192.168.100.12，这是非主第二节点的 IP。

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-apiserver-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          0s        192.168.100.12   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         2/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   kube-scheduler-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   1          3h        192.168.100.17   ubuntu

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   kube-dns-6f4fd4bdf-wfqhb   3/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9           1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   weave-net-fkgnh            2/2       Running   0          3h        192.168.100.12   ubuntu


kubectl describe nodes
Name:               ubuntu
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=ubuntu
                    node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             node-role.kubernetes.io/master:NoSchedule
CreationTimestamp:  Fri, 02 Mar 2018 08:21:47 -0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 11:28:25 -0800   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.100.12
  Hostname:    ubuntu
Capacity:
 cpu:     4
 memory:  6080832Ki
 pods:    110
Allocatable:
 cpu:     4
 memory:  5978432Ki
 pods:    110
System Info:
 Machine ID:                 59bf65b835b242a3aa182f4b8a542219
 System UUID:                0C3C4D56-4747-D59E-EE09-F16F2793677E
 Boot ID:                    658b4a08-d724-425e-9246-2b41995ecc46
 Kernel Version:             4.13.0-36-generic
 OS Image:                   Ubuntu 17.10
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.3
 Kube-Proxy Version:         v1.9.3
ExternalID:                  ubuntu
Non-terminated Pods:         (3 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                        ------------  ----------  ---------------  -------------
  kube-system                kube-dns-6f4fd4bdf-wfqhb    260m (6%)     0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-proxy-h4hz9            0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                weave-net-fkgnh             20m (0%)      0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  280m (7%)     0 (0%)      110Mi (1%)       170Mi (2%)
Events:
  Type     Reason                   Age                 From             Message
  ----     ------                   ----                ----             -------
  Warning  Rebooted                 12m (x814 over 2h)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
  Normal   NodeHasNoDiskPressure    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasNoDiskPressure
  Normal   Starting                 10m                 kubelet, ubuntu  Starting kubelet.
  Normal   NodeAllocatableEnforced  10m                 kubelet, ubuntu  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientMemory
  Normal   NodeNotReady             10m                 kubelet, ubuntu  Node ubuntu status is now: NodeNotReady
  Warning  Rebooted                 2m (x870 over 2h)   kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 658b4a08-d724-425e-9246-2b41995ecc46
  Warning  Rebooted                 15s (x60 over 10m)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0

我究竟做错了什么？

score 5 · Accepted Answer

因此，在遵循@errordeveloper 的建议并仍然碰壁之后，我能够解决原来非常简单的问题。

我的两个虚拟机都有相同的主机名。

hostname -f

会回来

ubuntu

在两者上，这显然会导致 kubernetes 出现问题。

我更改了我的非主节点上的名称

hostnamectl set-hostname kminion

并在以下文件中：

/etc/hostname
/etc/hosts

一切顺利！

score 1 · Accepted Answer

我应该更改 IP 地址（更改为什么）？

是的，这通常是在默认路由用于对 Internet 进行 NAT 访问的 VM 上进行工作的方法。

您想使用桥接网络的 IP，对您来说似乎是192.168.100.17（但请仔细检查）。

首先，请尝试使用kubeadm init --apiserver-advertise-address 192.168.100.17，但这可能无法解决所有问题。

在你的输出中kubectl describe nodes，我可以看到这个

Addresses:
  InternalIP:  172.17.0.1
  Hostname:    ubuntu

所以你可能想确保 kubelet 也没有使用 NATed 接口，为此你需要使用 kubelet 的--node-ip标志。

但是，还有其他方法可以解决此问题，例如，如果您可以确保hostname -i返回桥接接口的 IP（您可以通过调整来完成/etc/hosts）。

kubernetes - Kubernetes - 主节点中的 kube-system pod 在工作节点加入后不断重启

2 回答 2

Related