在使用 GCP 的 Anthos Config Management 时,我反复遇到 admission-webhook pod 在OOMKilled状态下死亡的错误。所以我尝试管理 pod 规范的内存请求,它似乎工作了一段时间,但是因为 config-management-system 命名空间中的所有对象字段都由控制器(ManagedFields)管理,我无法更改规范admission-webhook 永久。我的意思是部署规范正在与原始规范进行协调。
有人可以帮我吗?
- 我可以强制更新部分托管字段吗?实际会好吗?因为我不想仅仅为了更新 Pod 的资源规范而侵入 Google 提供的 Pod。
- 即使我通过强制 (
kubectl apply -f .. --force-conflicts --server-side
) 更新托管字段,其他经理也会恢复原始规范。有什么办法可以应对目前的情况吗?
豆荚状态。
$ kubectl get pods -n config-management-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
admission-webhook-76b67c9f8c-vt8n6 0/1 CrashLoopBackOff 6 10m 10.40.0.76 gke-seoul-a-default-pool-e2dcfaa0-8tlw <none> <none>
admission-webhook-76b67c9f8c-wdlzg 0/1 CrashLoopBackOff 6 10m 10.40.1.95 gke-seoul-a-default-pool-e2dcfaa0-z2l0 <none> <none>
reconciler-manager-7f95dbf7-ss5z2 2/2 Running 0 6h57m 10.40.1.88 gke-seoul-a-default-pool-e2dcfaa0-z2l0 <none> <none>
root-reconciler-5ddb78479c-lmkmk 3/3 Running 0 6h53m 10.40.1.89 gke-seoul-a-default-pool-e2dcfaa0-z2l0 <none> <none>
和死豆荚的日志。
kubectl logs -f admission-webhook-76b67c9f8c-vt8n6 -n config-management-system --previous
I0824 08:32:16.821366 1 setup.go:15] Build Version:
I0824 08:32:16.821436 1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:17.922013 1 request.go:655] Throttling request took 1.007744493s, request: GET:https://10.44.0.1:443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0824 08:32:21.132165 1 deleg.go:130] controller-runtime/metrics "level"=0 "msg"="metrics server is starting to listen" "addr"=":8080"
I0824 08:32:21.132367 1 deleg.go:130] setup "level"=0 "msg"="creating certificate rotator for webhook"
I0824 08:32:21.132498 1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:21.132660 1 deleg.go:130] setup "level"=0 "msg"="waiting for certificate rotator"
I0824 08:32:21.132845 1 internal.go:385] controller-runtime/manager "level"=0 "msg"="starting metrics server" "path"="/metrics"
I0824 08:32:21.132937 1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource" "source"={}
I0824 08:32:21.133137 1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource" "source"={}
I0824 08:32:21.133295 1 deleg.go:130] cert-rotation "level"=0 "msg"="starting cert rotator controller"
I0824 08:32:21.233606 1 controller.go:173] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting Controller"
I0824 08:32:21.233650 1 controller.go:211] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting workers" "worker count"=1
I0824 08:32:21.234167 1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:21.234564 1 deleg.go:130] cert-rotation "level"=0 "msg"="no cert refresh needed"
I0824 08:32:21.234611 1 deleg.go:130] cert-rotation "level"=0 "msg"="certs are ready in /certs"
I0824 08:32:21.239218 1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:22.659789 1 deleg.go:130] cert-rotation "level"=0 "msg"="CA certs are injected to webhooks"
I0824 08:32:22.660065 1 deleg.go:130] setup "level"=0 "msg"="registering validating webhook"
描述。
Name: admission-webhook-76b67c9f8c-vt8n6
Namespace: config-management-system
Priority: 0
Node: gke-seoul-a-default-pool-e2dcfaa0-8tlw/10.178.0.6
Start Time: Tue, 24 Aug 2021 17:25:44 +0900
Labels: app=admission-webhook
pod-template-hash=76b67c9f8c
Annotations: <none>
Status: Running
IP: 10.40.0.76
IPs:
IP: 10.40.0.76
Controlled By: ReplicaSet/admission-webhook-76b67c9f8c
Containers:
admission-webhook:
Container ID: containerd://95084e16bedb0f75ed18c24b2b16a815dda0fef87bbaea00df606b9093cca197
Image: gcr.io/config-management-release/admission-webhook:v1.8.1-rc.2
Image ID: gcr.io/config-management-release/admission-webhook@sha256:89783f083940d75cc4b7c51428966751a29e7edd606872b23be090a0a1655ecc
Port: 10250/TCP
Host Port: 0/TCP
Command:
/admission-webhook
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 27 Aug 2021 20:09:53 +0900
Finished: Fri, 27 Aug 2021 20:10:04 +0900
Ready: False
Restart Count: 854
Limits:
cpu: 200m
memory: 100Mi
Requests:
cpu: 100m
memory: 20Mi
Environment: <none>
Mounts:
/certs from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from admission-webhook-token-gt7qz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: admission-webhook-cert
Optional: false
admission-webhook-token-gt7qz:
Type: Secret (a volume populated by a Secret)
SecretName: admission-webhook-token-gt7qz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m25s (x19992 over 3d2h) kubelet Back-off restarting failed container
信息。
- ACM 版本 1.8.1
- 非结构化,基于 Github 和 Token。
- Github 存储库仅用于测试目的,带有一个很小的 configmap;同步良好,但 admission webhook 无法阻止我使用 kubectl 删除 configmap,即使它在删除后恢复得很好。
- 从头开始重新安装了几次。