0

我正在尝试使用 GKE 中的 gvisor 沙盒配置一个新的节点池。我使用 GCP Web 控制台添加新节点池,使用cos_containerd操作系统并选中 Enable gvisor Sandboxing 复选框,但每次配置节点池都会失败,并在 GCP 控制台通知中显示“未知错误”。节点永远不会加入 K8S 集群。

GCE VM 似乎可以正常启动,当我查看journalctl节点时,我看到它cloud-init似乎已经完成得很好,但是 kubelet 似乎无法启动。我看到这样的错误消息:

Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.184163    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.284735    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.385229    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.485626    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.522961    1143 eviction_manager.go:251] eviction manager: failed to get summary stats: failed to get node info: node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz containerd[976]: time="2020-10-12T16:58:07.576735750Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.577353    1143 kubelet.go:2191] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.587824    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:07 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:07.989869    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:08 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:08.090287    1143 
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.296365    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.396933    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz node-problem-detector[1166]: F1012 16:58:09.449446    2481 main.go:71] cannot create certificate signing request: Post https://172.17.0.2/apis/certificates.k8s.io/v1beta1/certificatesigningrequests?timeout=5m0s: dial tcp 172.17.0.2:443: connect: no route 
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz node-problem-detector[1166]: E1012 16:58:09.450695    1166 manager.go:162] failed to update node conditions: Patch https://172.17.0.2/api/v1/nodes/gke-main-sanboxes-dd9b8d84-dmzz/status: getting credentials: exec: exit status 1
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.453825    2486 cache.go:125] failed reading existing private key: open /var/lib/kubelet/pki/kubelet-client.key: no such file or directory
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.543449    1143 kubelet.go:2271] node "gke-main-sanboxes-dd9b8d84-dmzz" not found
Oct 12 16:58:09 gke-main-sanboxes-dd9b8d84-dmzz kubelet[1143]: E1012 16:58:09.556623    2486 tpm.go:124] failed reading AIK cert: tpm2.NVRead(AIK cert): decoding NV_ReadPublic response: handle 1, error code 0xb : the handle is not correct for the use

我不太确定是什么原因造成的,我真的很希望能够对这个节点池使用自动缩放,所以我不想只为这个节点手动修复它,而对任何新节点都必须这样做加入的节点。如何配置节点池,以便基于 gvisor 的节点自行配置?

我的集群详细信息:

  • GKE 版本:1.17.9-gke.6300
  • 集群类型:区域
  • VPC 原生
  • 私有集群
  • 屏蔽 GKE 节点
4

1 回答 1

0

您可以通过以下链接报告 Google 产品的问题:

您将需要选择:Create new Google Kubernetes Engine issueCompute部分下。


我可以确认我在创建问题中描述的集群时偶然发现了同样的问题(私有、屏蔽等):

  • 创建具有一个节点池的集群。
  • gvisor集群创建成功后添加启用的节点池。

像上面那样创建集群会将GKE集群推送到RECONCILING状态:

NAME        LOCATION      MASTER_VERSION   MASTER_IP       MACHINE_TYPE  NODE_VERSION     NUM_NODES  STATUS
gke-gvisor  europe-west3  1.17.9-gke.6300  XX.XXX.XXX.XXX  e2-medium     1.17.9-gke.6300  6          RECONCILING

集群状态的变化:

  • Provisoning- 创建集群
  • Running- 创建集群
  • Reconciling- 添加节点池
  • Running- 添加节点池(大约一分钟)
  • Reconciling- 集群进入该状态大约 25 分钟

GCP Cloud Console (Web UI) 报告:Repairing Cluster

于 2020-10-20T12:56:19.263 回答