3

我的集群证书已过期,现在我无法执行任何 kubectls 命令。

root@node1:~# kubectl get ns
Unable to connect to the server: x509: certificate has expired or is not yet valid
root@node1:~# 

我使用 Kubespray 创建了这个集群,kubeadm 版本是 v1.16.3 和 kubernetesVersion v1.16.3

root@node1:~# kubeadm alpha certs check-expiration
failed to load existing certificate apiserver-etcd-client: open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory
To see the stack trace of this error execute with --v=5 or higher
root@node1:~# 

并且发现 /etc/kubernetes/pki 目录下缺少 apiserver-etcd-client.crt 和 apiserver-etcd-client.key 文件。

root@node1:/etc/kubernetes/pki# ls -ltr
total 72
-rw------- 1 root root 1679 Jan 24 2020 ca.key
-rw-r--r-- 1 root root 1025 Jan 24 2020 ca.crt
-rw-r----- 1 root root 1679 Jan 24 2020 apiserver.key.old
-rw-r----- 1 root root 1513 Jan 24 2020 apiserver.crt.old
-rw------- 1 root root 1679 Jan 24 2020 apiserver.key
-rw-r--r-- 1 root root 1513 Jan 24 2020 apiserver.crt
-rw------- 1 root root 1675 Jan 24 2020 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1099 Jan 24 2020 apiserver-kubelet-client.crt
-rw-r----- 1 root root 1675 Jan 24 2020 apiserver-kubelet-client.key.old
-rw-r----- 1 root root 1099 Jan 24 2020 apiserver-kubelet-client.crt.old
-rw------- 1 root root 1679 Jan 24 2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1038 Jan 24 2020 front-proxy-ca.crt
-rw-r----- 1 root root 1675 Jan 24 2020 front-proxy-client.key.old
-rw-r----- 1 root root 1058 Jan 24 2020 front-proxy-client.crt.old
-rw------- 1 root root 1675 Jan 24 2020 front-proxy-client.key
-rw-r--r-- 1 root root 1058 Jan 24 2020 front-proxy-client.crt
-rw------- 1 root root 451 Jan 24 2020 sa.pub
-rw------- 1 root root 1679 Jan 24 2020 sa.key
root@node1:/etc/kubernetes/pki#

我尝试了以下命令,但没有任何效果并显示错误:

#sudo kubeadm alpha certs renew all
#kubeadm alpha phase certs apiserver-etcd-client
#kubeadm alpha certs apiserver-etcd-client --config /etc/kubernetes/kubeadm-config.yaml

Kubespray 命令:

#ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

上述命令以以下错误结束:

失败的!=> {"attempts": 5, "changed": true, "cmd": ["/usr/local/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token ", "create"], "delta": "0:01:15.058756", "end": "2021-02-05 13:32:51.656901", "msg": "非零返回码", "rc ": 1, "start": "2021-02-05 13:31:36.598145", "stderr": "超时等待条件\n要查看此错误的堆栈跟踪,请使用 --v=5 或更高版本执行", "stderr_lines": ["timed out waiting for the condition", "查看此错误的堆栈跟踪执行 --v=5 或更高"], "stdout": "", "stdout_lines": [] }

# cat /etc/kubernetes/kubeadm-config.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: master1_IP
  bindPort: 6443
certificateKey: xxx
nodeRegistration:
  name: node1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  criSocket: /var/run/dockershim.sock
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
clusterName: cluster.local
etcd:
  external:
      endpoints:
      - https://master1:2379
      - https://master2:2379
      - https://master3:2379
      caFile: /etc/ssl/etcd/ssl/ca.pem
      certFile: /etc/ssl/etcd/ssl/node-node1.pem
      keyFile: /etc/ssl/etcd/ssl/node-node1-key.pem
dns:
  type: CoreDNS
  imageRepository: docker.io/coredns
  imageTag: 1.6.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: IP/18
  podSubnet: IP/18
kubernetesVersion: v1.16.3
controlPlaneEndpoint: master1_IP:6443
certificatesDir: /etc/kubernetes/ssl
imageRepository: gcr.io/google-containers
apiServer:
4

2 回答 2

3

首先,您需要更新过期的证书,用于kubeadm执行此操作:

kubeadm alpha certs renew apiserver
kubeadm alpha certs renew apiserver-kubelet-client
kubeadm alpha certs renew front-proxy-client

接下来生成新kubeconfig文件:

kubeadm alpha kubeconfig user --client-name kubernetes-admin --org system:masters > /etc/kubernetes/admin.conf
kubeadm alpha kubeconfig user --client-name system:kube-controller-manager > /etc/kubernetes/controller-manager.conf
# instead of $(hostname) you may need to pass the name of the master node as in "/etc/kubernetes/kubelet.conf" file.
kubeadm alpha kubeconfig user --client-name system:node:$(hostname) --org system:nodes > /etc/kubernetes/kubelet.conf 
kubeadm alpha kubeconfig user --client-name system:kube-scheduler > /etc/kubernetes/scheduler.conf

复制新kubernetes-admin kubeconfig文件:

cp /etc/kubernetes/admin.conf ~/.kube/config

最后,您需要重新启动kube-apiserverkube-controller-managerkube-scheduler。您可以使用以下命令或仅重新启动主节点:

sudo kill -s SIGHUP $(pidof kube-apiserver)
sudo kill -s SIGHUP $(pidof kube-controller-manager)
sudo kill -s SIGHUP $(pidof kube-scheduler)

此外,您可以在github上找到更多信息,这个答案可能对您有很大帮助。

于 2021-02-11T17:34:36.453 回答
0

就我而言,我使用AKS(Azure Kubernetes 服务)来修复此错误,我运行了以下命令:

az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME

跟随链接:https ://docs.microsoft.com/en-us/azure/aks/certificate-rotation

于 2021-08-30T22:20:38.093 回答