我使用一个 ControlPlane 和一个工作节点在 VMWare 上的 Kubernetes 集群中运行 Kafka。从 ControlPlane 节点,我的客户端可以与 Kafka 通信,但从我的工作节点,这最终会出现此错误
%3|1638529687.405|FAIL|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap]: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
%3|1638529687.406|ERROR|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:app]: apollo-prototype-765f4d8bcf-bjpf4#producer-2: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
这是我的 Kafka 集群清单(使用 Strimzi)
listeners:
- name: plain
port: 9092
type: internal
tls: false
authentication:
type: scram-sha-512
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: scram-sha-512
configuration:
class: nginx
bootstrap:
host: localb.kafka.xxx.com
brokers:
- broker: 0
host: local.kafka.xxx.com
值得一提的是,完全相同的配置,当我在云中运行时,工作完美。
Telnet和nslookup(来自两个节点)抛出一个错误。CoreDNS 日志甚至没有提到这个错误。两个节点上的防火墙也被禁用。
你能帮帮我吗?谢谢!
更新:解决方案 Calico Pod(来自工作节点)抱怨那只鸟: Netlink: Network is down,即使它没有崩溃
2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down
这就是我所做的,它就像一个魅力!
该故障是由节点加载的不同ipvs模块引起的。我为新节点配置了ipip模块,但是老节点没有加载ipip模块,导致calico异常。删除ipip模块恢复正常。
[root@k8s-node236-232 ~]# lsmod | grep ipip ipip 16384 0 tunnel4 16384 1 ipip ip_tunnel 24576 1 ipip [root@k8s-node236-232 ~]# modprobe -r ipip [root@k8s-node236-232 ~]# lsmod | grep ipip