1

在使用 GCP 的 Anthos Config Management 时,我反复遇到 admission-webhook pod 在OOMKilled状态下死亡的错误。所以我尝试管理 pod 规范的内存请求,它似乎工作了一段时间,但是因为 config-management-system 命名空间中的所有对象字段都由控制器(ManagedFields)管理,我无法更改规范admission-webhook 永久。我的意思是部署规范正在与原始规范进行协调。

有人可以帮我吗?

  • 我可以强制更新部分托管字段吗?实际会好吗?因为我不想仅仅为了更新 Pod 的资源规范而侵入 Google 提供的 Pod。
  • 即使我通过强制 ( kubectl apply -f .. --force-conflicts --server-side) 更新托管字段,其他经理也会恢复原始规范。有什么办法可以应对目前的情况吗?

豆荚状态。

$ kubectl get pods -n config-management-system -o wide
NAME                                 READY   STATUS             RESTARTS   AGE     IP           NODE                                     NOMINATED NODE   READINESS GATES
admission-webhook-76b67c9f8c-vt8n6   0/1     CrashLoopBackOff   6          10m     10.40.0.76   gke-seoul-a-default-pool-e2dcfaa0-8tlw   <none>           <none>
admission-webhook-76b67c9f8c-wdlzg   0/1     CrashLoopBackOff   6          10m     10.40.1.95   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>
reconciler-manager-7f95dbf7-ss5z2    2/2     Running            0          6h57m   10.40.1.88   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>
root-reconciler-5ddb78479c-lmkmk     3/3     Running            0          6h53m   10.40.1.89   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>

和死豆荚的日志。

kubectl logs -f admission-webhook-76b67c9f8c-vt8n6 -n config-management-system --previous
I0824 08:32:16.821366       1 setup.go:15] Build Version:
I0824 08:32:16.821436       1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:17.922013       1 request.go:655] Throttling request took 1.007744493s, request: GET:https://10.44.0.1:443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0824 08:32:21.132165       1 deleg.go:130] controller-runtime/metrics "level"=0 "msg"="metrics server is starting to listen"  "addr"=":8080"
I0824 08:32:21.132367       1 deleg.go:130] setup "level"=0 "msg"="creating certificate rotator for webhook"
I0824 08:32:21.132498       1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:21.132660       1 deleg.go:130] setup "level"=0 "msg"="waiting for certificate rotator"
I0824 08:32:21.132845       1 internal.go:385] controller-runtime/manager "level"=0 "msg"="starting metrics server"  "path"="/metrics"
I0824 08:32:21.132937       1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource"  "source"={}
I0824 08:32:21.133137       1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource"  "source"={}
I0824 08:32:21.133295       1 deleg.go:130] cert-rotation "level"=0 "msg"="starting cert rotator controller"
I0824 08:32:21.233606       1 controller.go:173] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting Controller"
I0824 08:32:21.233650       1 controller.go:211] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting workers"  "worker count"=1
I0824 08:32:21.234167       1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:21.234564       1 deleg.go:130] cert-rotation "level"=0 "msg"="no cert refresh needed"
I0824 08:32:21.234611       1 deleg.go:130] cert-rotation "level"=0 "msg"="certs are ready in /certs"
I0824 08:32:21.239218       1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:22.659789       1 deleg.go:130] cert-rotation "level"=0 "msg"="CA certs are injected to webhooks"
I0824 08:32:22.660065       1 deleg.go:130] setup "level"=0 "msg"="registering validating webhook"

描述。

Name:         admission-webhook-76b67c9f8c-vt8n6
Namespace:    config-management-system
Priority:     0
Node:         gke-seoul-a-default-pool-e2dcfaa0-8tlw/10.178.0.6
Start Time:   Tue, 24 Aug 2021 17:25:44 +0900
Labels:       app=admission-webhook
              pod-template-hash=76b67c9f8c
Annotations:  <none>
Status:       Running
IP:           10.40.0.76
IPs:
  IP:           10.40.0.76
Controlled By:  ReplicaSet/admission-webhook-76b67c9f8c
Containers:
  admission-webhook:
    Container ID:  containerd://95084e16bedb0f75ed18c24b2b16a815dda0fef87bbaea00df606b9093cca197
    Image:         gcr.io/config-management-release/admission-webhook:v1.8.1-rc.2
    Image ID:      gcr.io/config-management-release/admission-webhook@sha256:89783f083940d75cc4b7c51428966751a29e7edd606872b23be090a0a1655ecc
    Port:          10250/TCP
    Host Port:     0/TCP
    Command:
      /admission-webhook
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 27 Aug 2021 20:09:53 +0900
      Finished:     Fri, 27 Aug 2021 20:10:04 +0900
    Ready:          False
    Restart Count:  854
    Limits:
      cpu:     200m
      memory:  100Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from admission-webhook-token-gt7qz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admission-webhook-cert
    Optional:    false
  admission-webhook-token-gt7qz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admission-webhook-token-gt7qz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                       From     Message
  ----     ------   ----                      ----     -------
  Warning  BackOff  3m25s (x19992 over 3d2h)  kubelet  Back-off restarting failed container

信息。

  • ACM 版本 1.8.1
  • 非结构化,基于 Github 和 Token。
  • Github 存储库仅用于测试目的,带有一个很小的 ​​configmap;同步良好,但 admission webhook 无法阻止我使用 kubectl 删除 configmap,即使它在删除后恢复得很好。
  • 从头开始重新安装了几次。
4

0 回答 0