我已经 helm 安装到我的 EKS 集群中的ingress-nginx
pod 总是失败,它的日志表明应用程序无法绑定到0.0.0.0:8443
( INADDR_ANY:8443
)。我已经确认它0.0.0.0:8443
确实已经绑定在容器中,但是我还没有对容器的 root 访问权限,我无法收集罪魁祸首进程/用户。
我在我正在使用的 kubernetes ingress-nginx 项目上创建了这个问题,但也想接触更广泛的 SO 社区,他们可能会为如何克服这个障碍提供见解、解决方案和故障排除建议。
作为 AWS/EKS 和 Kubernetes 的新手,可能存在一些环境配置错误导致此问题。例如,这可能是由错误配置的 AWS 主义(例如 VPC(其子网或安全组))引起的吗?预先感谢您的帮助!
链接的GitHub 问题提供了有关 Terraform 配置的 EKS 环境以及 Helm 安装的ingress-nginx
. 以下是一些关键细节:
- EKS 集群配置为仅使用 Fargate 工作程序,并具有 3 个公共子网和 3 个私有子网,所有 6 个子网均可供集群及其每个 Fargate 配置文件使用。
- 还应该注意的是,集群是新的,ingress-nginx pod 是第一次尝试将任何东西部署到集群,除了 kube-system 项目,如 coredns,它已配置为在 Fargate 中运行。(这需要手动删除默认的 ec2 注释,如此处所述)
- 有 6 个 fargate 配置文件,但目前只有 2 个正在使用:
coredns
和ingress
. 这些分别专用于 kube-system/kube-dns 和 ingress-nginx。除了选择器的命名空间和标签之外,配置文件规范没有任何“自定义”。已确认选择器对 coredns 和 ingress 都有效。即,入口 pod 计划运行,但失败了。 - 使用端口 8443的原因
ingress-nginx
是我第一次遇到了这个 Privilege Escalation 问题,其解决方法需要禁用allowPrivilegeEscalation
并将端口从特权端口更改为非特权端口。我正在调用helm install
以下值:
controller:
extraArgs:
http-port: 8080
https-port: 8443
containerPort:
http: 8080
https: 8443
service:
ports:
http: 80
https: 443
targetPorts:
http: 8080
https: 8443
image:
allowPrivilegeEscalation: false
# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes
livenessProbe:
initialDelaySeconds: 60 # 30
readinessProbe:
initialDelaySeconds: 60 # 0
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
- 正如我最初的观察结果(在我查看日志之前)是 K8s 活跃度/就绪度探测失败/超时,我首先尝试
initialDelaySeconds
在传递给 helm install 的值中扩展它们。但最终我查看了 pod/container 日志,发现无论 *ness 探针设置如何,每次我重新安装ingress-nginx
pod 并稍等片刻,日志都会显示这里报告的相同绑定错误:
2021/11/12 17:15:02 [emerg] 27#27: bind() to [::]:8443 failed (98: Address in use)
.
.```
6. Aside from what I've noted above, I haven't intentionally configured anything "non-stock". I'm a bit lost in AWS/K8s's sea of configuration looking for what piece I need to adapt/correct.
Do you have clues or guesses why INADDR_ANY, port 8443 would already be bound in my (fairly-standard) `nginx-ingress-ingress-nginx-controller` pod/container?
As I aluded earlier, I am able to execute `netstat` command inside the running container as default user `www-data` to confirm indeed 0:8443 is already bound, but because I haven't yet figured out how to get root access, the PID/name of the processes are not available to me:
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:10245 0.0.0.0:* LISTEN -
tcp 3 0 127.0.0.1:10246 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:10247 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8181 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8181 0.0.0.0:* LISTEN -
tcp 0 0 :::8443 :::* LISTEN -
tcp 0 0 :::10254 :::* LISTEN -
tcp 0 0 :::8080 :::* LISTEN -
tcp 0 0 :::8080 :::* LISTEN -
tcp 0 0 :::8181 :::* LISTEN -
tcp 0 0 :::8181 :::* LISTEN -```
```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- /bin/bash
bash-5.1$ whoami
www-data
bash-5.1$ ps aux
PID USER TIME COMMAND
1 www-data 0:00 /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx
8 www-data 0:00 /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --configmap=ingress/n
28 www-data 0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf
30 www-data 0:00 nginx: worker process
45 www-data 0:00 /bin/bash
56 www-data 0:00 ps aux```
I'm currently looking into how to get root access to my Fargate containers (without mucking about with their Dockerfiles to install ssh..) so I can get more insight into what process/user is binding INADDR_ANY:8443 in this pod/container.