amazon-web-services - AWS 负载均衡器部署失败

Question

score 1 · Accepted Answer

我在这里找到了答案。faragate 部署需要区域和 vpc-id。

helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
    --set clusterName=<cluster-name> \
    --set serviceAccount.create=false \
    --set region=<region-code> \
    --set vpcId=<vpc-xxxxxxxx>> \
    --set serviceAccount.name=aws-load-balancer-controller \
    -n kube-system

score 1 · Accepted Answer

从当前的 LB 控制器清单中，我发现 LB 控制器 Pod 规范没有Readiness probe，只有Liveness probe. 这意味着 Pod 在Ready通过 Liveness 探测后立即变为：

      livenessProbe:
        failureThreshold: 2
        httpGet:
          path: /healthz
          port: 61779
          scheme: HTTP
        initialDelaySeconds: 30
        timeoutSeconds: 10

但正如我们在以下输出中看到的，LB 控制器的 Pod 处于Pending状态：

[ec2-user@ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME                                            READY   STATUS    RESTARTS   AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f   0/1     Pending   0          30m

如果 Pod 保持在Pending状态，则意味着kube-scheduler无论出于何种原因都无法将 Pod 绑定到集群节点。

Kube-scheduler是 Kubernetes 控制平原的一部分，负责将 Pod 分配给节点。

此阶段不存在 Pod 日志，因为 Pod 的容器尚未启动。

检查原因最方便的方法是使用以下kubectl describe命令：

kubectl describe pod/podname -n namespacename

在输出的底部有与 Pod 生命周期相关的事件列表。以下是通用 Ubuntu Pod 的示例：

Events:
  Type    Reason     Age                From               Message
  ----    ------     ----               ----               -------
  Normal  Scheduled  37s                default-scheduler  Successfully assigned default/ubuntu to k8s-w1
  Normal  Pulling    25s (x2 over 35s)  kubelet, k8s-w1    Pulling image "ubuntu"
  Normal  Pulled     23s (x2 over 30s)  kubelet, k8s-w1    Successfully pulled image "ubuntu"
  Normal  Created    23s (x2 over 30s)  kubelet, k8s-w1    Created container ubuntu
  Normal  Started    23s (x2 over 29s)  kubelet, k8s-w1    Started container ubuntu

kubectl get events命令也可以显示问题。例如：

LAST SEEN   TYPE     REASON      OBJECT       MESSAGE
21s         Normal   Scheduled   pod/ubuntu   Successfully assigned default/ubuntu to k8s-w1
9s          Normal   Pulling     pod/ubuntu   Pulling image "ubuntu"
7s          Normal   Pulled      pod/ubuntu   Successfully pulled image "ubuntu"
7s          Normal   Created     pod/ubuntu   Created container ubuntu
7s          Normal   Started     pod/ubuntu   Started container ubuntu

或者调度程序无法将 Pod 分配给节点的原因可能是：

"No nodes are available that match all of the predicates: Insufficient cpu (2), Insufficient memory (2)".

在某些情况下，可能会在命名空间中的kube-schedulerPod 日志中发现错误kube-system。可以使用以下命令列出日志：

kubectl logs $(kubectl get pods -l component=kube-scheduler,tier=control-plane -n kube-system -o name) -n kube-system

未安排 pod 的最常见原因如下：

节点上的 Pod 请求的 CPU 或内存资源不足。
Pod 不能容忍节点上的污点
Pod 具有阻止其调度的Affinity/AntiAffinity配置
无法满足 Pod 规范中的存储或其他特定资源（如 GPU）要求

amazon-web-services - AWS 负载均衡器部署失败

2 回答 2

Related

Reference