我们在 Ubuntu 上运行 1.10 的 Kubernetes 集群中遇到了间歇性连接/dns 问题。
我们一直在查看错误报告/等,最近我们可以确定一个进程正在保留/run/xtables.lock
,它导致 kube-proxy pod 出现问题。
绑定到工作人员的 kube-proxy pod 之一在日志中重复出现此错误:
E0920 13:39:42.758280 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 13:46:46.193919 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:05:45.185720 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:11:52.455183 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:38:36.213967 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:44:43.442933 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
这些错误大约在 3 周前开始发生,到目前为止我们一直无法纠正。因为问题是间歇性的,我们直到现在才追查到这个问题。
我们认为这导致其中一个 kube-flannel-ds pod 也处于永久CrashLoopBackOff
状态:
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-6z6rs 1/1 Running 0 40d
coredns-78fcdf6894-dddqd 1/1 Running 0 40d
etcd-k8smaster1 1/1 Running 0 40d
kube-apiserver-k8smaster1 1/1 Running 0 40d
kube-controller-manager-k8smaster1 1/1 Running 0 40d
kube-flannel-ds-amd64-sh5gc 1/1 Running 0 40d
kube-flannel-ds-amd64-szkxt 0/1 CrashLoopBackOff 7077 40d
kube-proxy-6pmhs 1/1 Running 0 40d
kube-proxy-d7d8g 1/1 Running 0 40d
kube-scheduler-k8smaster1 1/1 Running 0 40d
大多数错误报告/run/xtables.lock
似乎表明它已在 2017 年 7 月解决,但我们在新设置中看到了这一点。我们似乎在 iptables 中有适当的链配置。
运行fuser /run/xtables.lock
什么也不返回。
有人对此有见识吗?它造成了很多痛苦