0

Zookeeper 3.5.3-beta does not work for me with GCloud Kubernetes Engine. Using the identical configuration with Zookeeper 3.4.10 works.

When I run a client sanity test, the only exception returned is this:

2017-11-29 14:27:17,597 [myid:1] - WARN  [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@273] - Unexpected exception, tries=0, remaining init limit=20000, connecting to zk-2.zk-svc.default.svc.cluster.local:2888
java.net.UnknownHostException: zk-2.zk-svc.default.svc.cluster.local

While it has been suggested that this problem is kube-dns related as indicated here.
kube-dns (dns.go:48] version: 1.14.4-2-g5584e04) seems to be working as expected:

/ # nslookup zk-0.zk-svc.default.svc.cluster.local
Server:    10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local

Name:      zk-0.zk-svc.default.svc.cluster.local
Address 1: 10.60.3.3 zk-0.zk-svc.default.svc.cluster.local
/ # nslookup zk-2.zk-svc.default.svc.cluster.local
Server:    10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local

Name:      zk-2.zk-svc.default.svc.cluster.local
Address 1: 10.60.4.3 zk-2.zk-svc.default.svc.cluster.local
/ # nslookup zk-1.zk-svc.default.svc.cluster.local
Server:    10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local

Name:      zk-1.zk-svc.default.svc.cluster.local
Address 1: 10.60.2.5 zk-1.zk-svc.default.svc.cluster.local

And there are no errors in the kube-dns log.

In 3.4.10, the first node also produces UnknownHostExceptions on initialization, but eventually provides this type of indication of resolution, but never in 3.5.3

2017-11-29 15:14:39,923 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: zk-0.zk-svc.default.svc.cluster.local to address: zk-0.zk-svc.default.svc.cluster.local/10.60.4.4

I do not have enough information to file an issue with Zookeeper, so I would appreciate any suggestions on how to debug this.

4

1 回答 1

0

根据ZOOKEEPER-2343最近的评论,我部署了一个 3.6.0-SNAPSHOT 图像。第二个和第三个节点立即接受客户端请求,但第一个节点不接受并报告“此 ZooKeeper 实例当前未服务请求”。

删除第一个节点可以解决该问题,因为它在启动时可以参与仲裁。

于 2017-11-30T09:17:20.277 回答