我正在尝试使用 GKE 中的 gvisor 沙盒配置一个新的节点池。我使用 GCP Web 控制台添加新节点池,使用cos_containerd
操作系统并选中 Enable gvisor Sandboxing 复选框,但每次配置节点池都会失败,并在 GCP 控制台通知中显示“未知错误”。节点永远不会加入 K8S 集群。
GCE VM 似乎可以正常启动,当我查看journalctl
节点时,我看到它cloud-init
似乎已经完成得很好,但是 kubelet 似乎无法启动。我看到这样的错误消息:
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.184163 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.284735 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.385229 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.485626 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.522961 1143 eviction_manager.go:251] eviction manager: failed to get summary stats: failed to get node info: node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz containerd[976]: time="2020-10-12T16:58:07.576735750Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.577353 1143 kubelet.go:2191] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.587824 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.989869 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:08 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:08.090287 1143
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.296365 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.396933 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz node-problem-detector[1166]: F1012 16:58:09.449446 2481 main.go:71] cannot create certificate signing request: Post https://172.17.0.2/apis/certificates.k8s.io/v1beta1/certificatesigningrequests?timeout=5m0s: dial tcp 172.17.0.2:443: connect: no route
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz node-problem-detector[1166]: E1012 16:58:09.450695 1166 manager.go:162] failed to update node conditions: Patch https://172.17.0.2/api/v1/nodes/gke-main-sanboxes-dd9b8d84-dmzz/status: getting credentials: exec: exit status 1
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.453825 2486 cache.go:125] failed reading existing private key: open /var/lib/kubelet/pki/kubelet-client.key: no such file or directory
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.543449 1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.556623 2486 tpm.go:124] failed reading AIK cert: tpm2.NVRead(AIK cert): decoding NV_ReadPublic response: handle 1, error code 0xb : the handle is not correct for the use
我不太确定是什么原因造成的,我真的很希望能够对这个节点池使用自动缩放,所以我不想只为这个节点手动修复它,而对任何新节点都必须这样做加入的节点。如何配置节点池,以便基于 gvisor 的节点自行配置?
我的集群详细信息:
- GKE 版本:1.17.9-gke.6300
- 集群类型:区域
- VPC 原生
- 私有集群
- 屏蔽 GKE 节点