我有一个自我管理的 Kubernetes 集群,由一个主节点和 3 个工作节点组成。我在集群中使用集群网络接口法兰绒。
在我所有的机器上,我都可以看到以下类型的内核消息:
Apr 12 04:22:24 worker-7 kernel: [278523.379954] iptables[6260]: segfault at 88 ip 00007f9e69fefe47 sp 00007ffee4dff356 error 4 in libnftnl.so.11.3.0[7f9e69feb000+16000]
Apr 12 04:22:24 worker-7 kernel: [278523.380094] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 78 00 00 00 48 39 f8 74 0b 85 f6 74
Apr 12 05:59:10 worker-7 kernel: [284329.182667] iptables[13978]: segfault at 88 ip 00007fb799fafe47 sp 00007fff22419b36 error 4 in libnftnl.so.11.3.0[7fb799fab000+16000]
Apr 12 05:59:10 worker-7 kernel: [284329.182774] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 98 00 00 00 48 39 f8 74 0b 85 f6 74
Apr 12 08:29:25 worker-7 kernel: [293343.999073] iptables[16041]: segfault at 88 ip 00007fa40c7f7e47 sp 00007ffe04ba9886 error 4 in libnftnl.so.11.3.0[7fa40c7f3000+16000]
Apr 12 08:29:25 worker-7 kernel: [293343.999165] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 98 00 00 00 48 39 f8 74 0b 85 f6 74
我缩小了范围,即消息源自 kube-flannel-ds pod。我有这样的日志消息:
Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -s 10.244.0.0/16 -j ACCEPT --wait]: exit status -1:
Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POS TROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully --wait]: exit status -1:
有人可以解释这种消息的含义吗?这可能是硬件问题吗?将 flannel 切换到另一个 kuberentes 容器网络接口 (CNI) 是否有意义 - 例如Calico?