1

我在 AWS 云上使用 KOPS 创建了一个 Kubernetes 集群。集群创建时没有任何问题,可以正常运行 10-15 小时。我已经在这个集群上部署了 SAP Vora2.1。然而,一般在 12-15 小时后,KOPS 集群会遇到与 kube-proxy 和 kube-dns 相关的问题。这些 pod 要么关闭,要么显示为已完成状态。还有很多重启。这最终导致我的应用程序 pod 出现问题,并且应用程序也出现故障。该应用程序使用 consul 进行服务发现,但是由于 kubernetes 基础服务无法正常工作,因此即使我尝试恢复 kube-proxy/kube-dns pod,应用程序也不会进入稳定状态。

这是一个以完全自动缩放模式设置的 3 节点集群(1 个主节点和 2 个节点)。覆盖网络使用默认的 kubenet。以下是系统遇到问题后 pod 状态的快照,

[root@ip-172-31-18-162 ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                                                       READY     STATUS                                                 RESTARTS   AGE
infyvora      vora-catalog-1549734119-cfnhz                                              0/2       CrashLoopBackOff                                       188        20h
infyvora      vora-consul-0                                                              0/1       CrashLoopBackOff                                       101        20h
infyvora      vora-consul-1                                                              1/1       Running                                                34         20h
infyvora      vora-consul-2                                                              0/1       CrashLoopBackOff                                       95         20h
infyvora      vora-deployment-operator-293895365-4b3t6                                   0/1       Completed                                              104        20h
infyvora      vora-disk-0                                                                1/2       CrashLoopBackOff                                       187        20h
infyvora      vora-dlog-0                                                                0/2       CrashLoopBackOff                                       226        20h
infyvora      vora-dlog-1                                                                1/2       CrashLoopBackOff                                       155        20h
infyvora      vora-doc-store-2451237348-dkrm6                                            0/2       CrashLoopBackOff                                       229        20h
infyvora      vora-elasticsearch-logging-v1-444540252-mwfrz                              0/1       CrashLoopBackOff                                       100        20h
infyvora      vora-elasticsearch-logging-v1-444540252-vrr63                              1/1       Running                                                14         20h
infyvora      vora-elasticsearch-retention-policy-137762458-ns5pc                        1/1       Running                                                13         20h
infyvora      vora-fluentd-kubernetes-v1.21-9f4pt                                        1/1       Running                                                12         20h
infyvora      vora-fluentd-kubernetes-v1.21-s2t1j                                        0/1       CrashLoopBackOff                                       99         20h
infyvora      vora-grafana-2929546178-vrf5h                                              1/1       Running                                                13         20h
infyvora      vora-graph-435594712-47lcg                                                 0/2       CrashLoopBackOff                                       157        20h
infyvora      vora-kibana-logging-3693794794-2qn86                                       0/1       CrashLoopBackOff                                       99         20h
infyvora      vora-landscape-2532068267-w1f5n                                            0/2       CrashLoopBackOff                                       232        20h
infyvora      vora-nats-streaming-1569990702-kcl1v                                       1/1       Running                                                13         20h
infyvora      vora-prometheus-node-exporter-k4c3g                                        0/1       CrashLoopBackOff                                       102        20h
infyvora      vora-prometheus-node-exporter-xp511                                        1/1       Running                                                13         20h
infyvora      vora-prometheus-pushgateway-399610745-tcfk7                                0/1       CrashLoopBackOff                                       103        20h
infyvora      vora-prometheus-server-3955170982-xpct0                                    2/2       Running                                                24         20h
infyvora      vora-relational-376953862-w39tc                                            0/2       CrashLoopBackOff                                       237        20h
infyvora      vora-security-operator-2514524099-7ld0k                                    0/1       CrashLoopBackOff                                       103        20h
infyvora      vora-thriftserver-409431919-8c1x9                                          2/2       Running                                                28         20h
infyvora      vora-time-series-1188816986-f2fbq                                          1/2       CrashLoopBackOff                                       184        20h
infyvora      vora-tools5tlpt-100252330-mrr9k                                            0/1       rpc error: code = 4 desc = context deadline exceeded   272        17h
infyvora      vora-tools6zr3m-3592177467-n7sxd                                           0/1       Completed                                              1          20h
infyvora      vora-tx-broker-4168728922-hf8jz                                            0/2       CrashLoopBackOff                                       151        20h
infyvora      vora-tx-coordinator-3910571185-l0r4n                                       0/2       CrashLoopBackOff                                       184        20h
infyvora      vora-tx-lock-manager-2734670982-bn7kk                                      0/2       Completed                                              228        20h
infyvora      vsystem-1230763370-5ckr0                                                   0/1       CrashLoopBackOff                                       115        20h
infyvora      vsystem-auth-1068224543-0g59w                                              0/1       CrashLoopBackOff                                       102        20h
infyvora      vsystem-vrep-1427606801-zprlr                                              0/1       CrashLoopBackOff                                       121        20h
kube-system   dns-controller-3110272648-chwrs                                            1/1       Running                                                0          22h
kube-system   etcd-server-events-ip-172-31-64-102.ap-southeast-1.compute.internal        1/1       Running                                                0          22h
kube-system   etcd-server-ip-172-31-64-102.ap-southeast-1.compute.internal               1/1       Running                                                0          22h
kube-system   kube-apiserver-ip-172-31-64-102.ap-southeast-1.compute.internal            1/1       Running                                                0          22h
kube-system   kube-controller-manager-ip-172-31-64-102.ap-southeast-1.compute.internal   1/1       Running                                                0          22h
kube-system   kube-dns-1311260920-cm1fs                                                  0/3       Completed                                              309        22h
kube-system   kube-dns-1311260920-hm5zd                                                  3/3       Running                                                39         22h
kube-system   kube-dns-autoscaler-1818915203-wmztj                                       1/1       Running                                                12         22h
kube-system   kube-proxy-ip-172-31-64-102.ap-southeast-1.compute.internal                1/1       Running                                                0          22h
kube-system   kube-proxy-ip-172-31-64-110.ap-southeast-1.compute.internal                0/1       CrashLoopBackOff                                       98         22h
kube-system   kube-proxy-ip-172-31-64-15.ap-southeast-1.compute.internal                 1/1       Running                                                13         22h
kube-system   kube-scheduler-ip-172-31-64-102.ap-southeast-1.compute.internal            1/1       Running                                                0          22h
kube-system   tiller-deploy-352283156-97hhb                                              1/1       Running                                                34         22h

有没有人在 AWS 上遇到过与 KOPS kubernetes 相关的类似问题。欣赏是否有解决此问题的指针。

问候, 迪帕克

4

0 回答 0