我尝试使用 n1-highmem-4 池创建 GKE 集群(版本:1.18.15-gke.1500)。在我尝试安装掌舵图之前,一切正常。helm 二进制文件(版本:3.5.2)最终出现错误“http2 连接丢失”。GKE 触发自动修复模式。我不明白为什么,因为我使用 kubectl 创建一些 configmap 没有问题。您是否知道我可以在哪里找到有关台球机或 GKE 总体规划的日志?
1 回答
Without the information on how exactly this cluster was created, what resources were applied with Helm
and the logs from the cluster (which I will address how to retrieve) it could be hard to pinpoint the issue and a reason behind it.
Overview:
GKE's node auto-repair feature helps you keep the nodes in your cluster in a healthy, running state. When enabled, GKE makes periodic checks on the health state of each node in your cluster. If a node fails consecutive health checks over an extended time period, GKE initiates a repair process for that node.
-- Cloud.google.com: Kubernetes Engine: Docs: How to: Node auto repair: Overview
Answering the question posted:
Did you known if there is some where I can find log about the pool machine or GKE master plan ?
Yes. There are ways that you can check your cluster health as well the logs of it.
GKE
generates a log entry for automated repair events. You can check the logs by using the:
gcloud container operations list
The output should look similar to the one below:
operation-XXXXXXXXXXXXX-XXXXXXXX CREATE_CLUSTER europe-west3-c example-cluster DONE 2021-03-07T11:59:55.133563829Z 2021-03-07T12:03:09.684215827Z
operation-YYYYYYYYYYYYY-YYYYYYYY AUTO_REPAIR_NODES europe-west3-c gke-example-cluster-default-pool-AAAAAAAA-AAAA DONE 2021-03-07T12:21:14.814774338Z 2021-03-07T12:24:15.6305881Z
Adding to that you can look for specific node logs with: Google Cloud's operations suite (formerly Stackdriver)
You can access it by following:
GCP Cloud Console (Web UI)
->Logging
->Upgrade
->Upgrade to the New Logs Explorer
and look for those logs using below filter:
resource.type="k8s_node"
resource.labels.cluster_name="CLUSTER-NAME"
resource.labels.project_id="PROJECT-NAME"
resource.labels.location="ZONE"
Additional resources: