0

我正在通过集群 API 在 openstack 顶部配置一个具有一个控制平面节点和一个工作节点的工作负载集群。但是 Kubernetes 控制平面无法在控制平面节点中正常启动。

我可以看到 kube-apiserver 不断退出并重新创建:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock ps -a
CONTAINER           IMAGE               CREATED              STATE               NAME                      ATTEMPT             POD ID
a729fdd387b0a       90d27391b7808       About a minute ago   Running             kube-apiserver            74                  88de61a0459f6
38b54a71cb0aa       90d27391b7808       3 minutes ago        Exited              kube-apiserver            73                  88de61a0459f6
24573a1c5adc5       b0f1517c1f4bb       18 minutes ago       Running             kube-controller-manager   4                   cc113aaae13b5
a2072b64cca1a       b0f1517c1f4bb       29 minutes ago       Exited              kube-controller-manager   3                   cc113aaae13b5
f26a531972518       d109c0821a2b9       5 hours ago          Running             kube-scheduler            1                   df1d15fd61a8f
a91b4c0ce9e27       303ce5db0e90d       5 hours ago          Running             etcd                      1                   16e1f0f5bb543
1565a1a7dedec       303ce5db0e90d       5 hours ago          Exited              etcd                      0                   16e1f0f5bb543
35ae23eb64f11       d109c0821a2b9       5 hours ago          Exited              kube-scheduler            0                   df1d15fd61a8f
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$

从 kube-apiserver 容器的日志中,我可以看到“http: TLS 握手错误来自 172.24.4.159:50812: EOF”:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock logs -f a729fdd387b0a
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0416 20:32:25.730809       1 server.go:596] external host was not specified, using 10.6.0.9
I0416 20:32:25.744220       1 server.go:150] Version: v1.17.3
......
......
I0416 20:33:46.816189       1 dynamic_cafile_content.go:166] Starting request-header::/etc/kubernetes/pki/front-proxy-ca.crt
I0416 20:33:46.816832       1 dynamic_cafile_content.go:166] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
I0416 20:33:46.833031       1 dynamic_serving_content.go:129] Starting serving-cert::/etc/kubernetes/pki/apiserver.crt::/etc/kubernetes/pki/apiserver.key
I0416 20:33:46.853958       1 secure_serving.go:178] Serving securely on [::]:6443
......
......
I0416 20:33:51.784715       1 log.go:172] http: TLS handshake error from 172.24.4.159:60148: EOF
I0416 20:33:51.786804       1 log.go:172] http: TLS handshake error from 172.24.4.159:60150: EOF
I0416 20:33:51.788984       1 log.go:172] http: TLS handshake error from 172.24.4.159:60158: EOF
I0416 20:33:51.790695       1 log.go:172] http: TLS handshake error from 172.24.4.159:60210: EOF
I0416 20:33:51.792577       1 log.go:172] http: TLS handshake error from 172.24.4.159:60214: EOF
I0416 20:33:51.793861       1 log.go:172] http: TLS handshake error from 172.24.4.159:60202: EOF
I0416 20:33:51.805506       1 log.go:172] http: TLS handshake error from 10.6.0.9:35594: EOF
I0416 20:33:51.806056       1 log.go:172] http: TLS handshake error from 172.24.4.159:60120: EOF
......

从 syslog 我可以看到 apiserver 服务证书已为 IP 172.24.4.159 签名:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ grep "apiserver serving cert is signed for DNS names" /var/log/syslog 
Apr 16 15:25:56 ubu1910-medflavor-nolb3-control-plane-nh4hf cloud-init[652]: [certs] apiserver serving cert is signed for DNS names [ubu1910-medflavor-nolb3-control-plane-nh4hf kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.6.0.9 172.24.4.159]

从 syslog 我还可以看到由于“net/http: TLS 握手超时”,kubelet 服务无法访问 apiserver:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ tail -F /var/log/syslog 
Apr 16 19:36:18 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:18.596206    1504 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get https://172.24.4.159:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:19.202346090Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:19.274089    1504 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Apr 16 19:36:20 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: W0416 19:36:20.600457    1504 status_manager.go:530] Failed to get status for pod "kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf_kube-system(24ec7abb1b94172adb053cf6fdd1648c)": Get https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf: net/http: TLS handshake timeout
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:24.336699210Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:24.379374    1504 controller.go:135] failed to ensure node lease exists, will retry in 7s, error: Get https://172.24.4.159:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubu1910-medflavor-nolb3-control-plane-nh4hf?timeout=10s: context deadline exceeded
......
......

我还尝试使用 curl 访问 apiserver,我看到:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl http://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
Client sent an HTTP request to an HTTPS server.

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$

kube-apiserver 的证书有问题吗?知道如何继续进行故障排除吗?

4

1 回答 1

1

如果您想查看 kube-api SSL 证书的详细信息,您可以使用curl -k -v https://172.24.4.159:6443openssl s_client -connect 172.24.4.159:6443

您没有提到如何配置证书。Kubernetes 中的 SSL 是复杂的野兽,手动设置证书和所有通信可能非常痛苦。这就是现在人们使用的原因kubeadm

TLDR:您必须确保所有证书均由/etc/kubernetes/pki/ca.crt.

既然您提到“单节点”,我假设 Kubelet 在同一台服务器上作为 SystemD 单元运行?那个 kube-api 容器是如何启动的?通过 Kubelet 进程本身,因为您在/etc/kubernetes/manifests?

kubelet和和之间实际上有两种通信方式,kube-api它们都是同时使用的:

  1. kubeletkube-api使用参数中的信息连接和验证--kubeconfig=/etc/kubernetes/kubelet.conf(您可以通过 进行检查ps -aux | grep kubelet)。在文件中,您将看到连接字符串、CA 证书和客户端证书 + 密钥)。Kubelet 从文件中提供客户端证书,kube-api并由 CA 从同一文件中验证服务器证书。kube-api使用在其自己的选项中定义的 CA 验证客户端证书--client-ca-file
  2. kube-api连接到kubelet使用--kubelet-client-certificate--kubelet-client-key选项。这可能不是问题所在。

kube-api因为您可以在侧面而不是侧面看到 SSL 错误kubelet。我认为第 n.1 点中描述的通信存在问题。kubelet连接并验证到kube-api. 错误在kube-api日志中,所以我会说kube-api验证kubelet客户端证书有问题。所以进去看看--kubeconfig=/etc/kubernetes/kubelet.conf。您可以通过 openssl 或一些在线 SSL 证书检查器对其进行 base64 解码并显示详细信息。最重要的部分是它必须由kube-api选项中定义的 CA 文件签名--client-ca-file

老实说,这一切都需要付出很多努力,您可以采取的最简单的方法是扔掉所有东西并用于kubeadm引导单节点集群:

  1. 从所有混乱中清理您的服务器
  2. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
  3. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
于 2021-04-16T22:54:32.047 回答