kubernetes - 无法使用 kubectl 连接到 Kubernetes 集群，并且 liveness 探测失败

Question

更新

我们从 Dockerfile 中注释掉了 django 迁移和 collectstatic，我们设法进行了新的部署（通过了 liveness/readinnes 探测）。我们认为这将与其中一个有关，但随后我们返回，python manage.py migrate并且python manage.py collecstatic一切都继续工作。因此，部署工作正常，但我们不知道他们为什么停止工作。

但是我们仍然无法使用kubectl. 我们仍然收到超时错误。即使来自 Gitlab 界面。

我们有一个运行在由 Gitlab AutoDevops 管理的 Kubernetes 集群上的应用程序。几天前，由于某种未知的原因，我们无法再使用 Pod 连接到我们的 Pod kubectl。我们收到Error from server: error dialing backend: dial timeout, backstop。要连接到我们使用的 podkubectl -n <namespace> exec -it <pod> -- bash

此外，与此同时，由于liveness和readiness探测失败，我们的部署开始失败。检查 GKE，我们会看到以下消息：

Readiness probe failed: Get "http://10.59.1.234:5000/": dial tcp 10.59.1.232:5000: connect: connection refused
Liveness probe failed: Get "http://10.59.1.234:5000/": dial tcp 10.59.1.232:5000: connect: connection refused

我试图增加initialDelaySeconds- Helm 变量来控制 Probe 的值 - 但没有成功。5分钟后出现超时错误（Error: release review-fix-run-pi-xrsd0t failed, and has been uninstalled due to atomic being set: timed out waiting for the condition）

应用程序仍在运行，但我们无法进行新的部署或访问 Pod。

下面是kubectl -n <namespace> describe pod <pod>在管道期间执行的命令的输出。几分钟后，管道出现故障。

IP:           10.59.1.234
IPs:
  IP:           10.59.1.234

Port:           5000/TCP
Host Port:      0/TCP
State:          Running

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  75s               default-scheduler  Successfully assigned daeb5798-review-fix-run-pi-xrsd0t/review-fix-run-pi-xrsd0t-6c975b9f4d-wgc4r to gke-os-us-central1-default-pool-da55e92e-8ssdxx
  Normal   Pulling    55s               kubelet            Pulling image "registry.gitlab.com/fix-run-pipeline:6a43a82369e87eee4ad86023694167aef6886451"
  Normal   Pulled     53s               kubelet            Successfully pulled image "registry.gitlab.com/fix-run-pipeline:6a43a82369e87eee4ad86023694167aef6886451" in 2.819782241s
  Normal   Created    51s               kubelet            Created container auto-deploy-app
  Normal   Started    49s               kubelet            Started container auto-deploy-app
  Warning  Unhealthy  5s (x4 over 35s)  kubelet            Readiness probe failed: Get "http://10.59.1.234:5000/readiness/": dial tcp 10.59.1.234:5000: connect: connection refused
  Warning  Unhealthy  5s (x3 over 25s)  kubelet            Liveness probe failed: Get "http://10.59.1.234:5000/healthz/": dial tcp 10.59.1.234:5000: connect: connection refused
  Normal   Killing    5s                kubelet            Container auto-deploy-app failed liveness probe, will be restarted

有人告诉我这个问题与我们的集群内部网络有关。但我不知道从这里去哪里。

我们还注意到一些关于未就绪状态的 pod 的警告：

有关如何解决或调查此问题的任何提示？

提前致谢。

kubernetes - 无法使用 kubectl 连接到 Kubernetes 集群，并且 liveness 探测失败

更新

0 回答 0

Related

Reference