0

我使用一个 ControlPlane 和一个工作节点在 VMWare 上的 Kubernetes 集群中运行 Kafka。从 ControlPlane 节点,我的客户端可以与 Kafka 通信,但从我的工作节点,这最终会出现此错误

   %3|1638529687.405|FAIL|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap]: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
   %3|1638529687.406|ERROR|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:app]: apollo-prototype-765f4d8bcf-bjpf4#producer-2: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)

这是我的 Kafka 集群清单(使用 Strimzi)

listeners:
  - name: plain
    port: 9092
    type: internal
    tls: false
    authentication:
      type: scram-sha-512
  - name: external
    port: 9094
    type: ingress
    tls: true
    authentication:
      type: scram-sha-512
    configuration:
      class: nginx
      bootstrap:
        host: localb.kafka.xxx.com
      brokers:
      - broker: 0
        host: local.kafka.xxx.com

值得一提的是,完全相同的配置,当我在云中运行时,工作完美。

Telnetnslookup(来自两个节点)抛出一个错误。CoreDNS 日志甚至没有提到这个错误。两个节点上的防火墙也被禁用。

你能帮帮我吗?谢谢!


更新:解决方案 Calico Pod(来自工作节点)抱怨那只鸟: Netlink: Network is down,即使它没有崩溃

2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down

就是我所做的,它就像一个魅力!

该故障是由节点加载的不同ipvs模块引起的。我为新节点配置了ipip模块,但是老节点没有加载ipip模块,导致calico异常。删除ipip模块恢复正常。

[root@k8s-node236-232 ~]# lsmod  | grep ipip
ipip                   16384  0 
tunnel4                16384  1 ipip
ip_tunnel              24576  1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod  | grep ipip
4

1 回答 1

1

Calico Pod (from the worker node) was complaining that bird: Netlink: Network is down, even it was not crashing

2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down

Here is what I have done and it worked like a charm!

The fault is caused by the different ipvs modules loaded by the node. I configured the ipip module for the new node, but the old node did not load the ipip module, which caused the calico exception. Delete the ipip module to return to normal.

[root@k8s-node236-232 ~]# lsmod  | grep ipip
ipip                   16384  0 
tunnel4                16384  1 ipip
ip_tunnel              24576  1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod  | grep ipip
于 2021-12-06T09:41:06.750 回答