我们在带有 openjdk:14-jdk-slim 图像的 kubernetes 环境中使用 hazelcast 4.2.1。在我们只有两个节点的开发环境中,这两个节点有时(在每 5 次部署后不久)最终会出现脑裂状态并且不会合并,尽管它们会找到彼此并就如何做达成一致:
第一个节点的加入者说第二个节点应该加入。而第二个不应该加入第一个节点的加入者。但什么也没有发生。日志每隔几分钟重复一次,并且不会合并集群。
我们是否使用合并策略并不重要。通常情况下,它可以毫无问题地工作。
第一个节点的日志:
2021-07-20 09:14:08.306 DEBUG 142 --- [hz.hazelcast-instance.cached.thread-4] c.h.i.cluster.impl.MembershipManager : [10.41.31.101]:5701 [light-cluster] [4.2.1] Sending member list to the non-master nodes:
Members {size:1, ver:5} [
Member [10.41.31.101]:5701 - 7263bccd-f330-4b96-8b52-f22db7c7a90e this
]
2021-07-20 09:14:08.446 DEBUG 142 --- [hz.hazelcast-instance.cached.thread-5] c.h.i.cluster.impl.DiscoveryJoiner : [10.41.31.101]:5701 [light-cluster] [4.2.1] Sending SplitBrainJoinMessage to [10.41.31.102]:5701
2021-07-20 09:14:08.448 DEBUG 142 --- [hz.hazelcast-instance.cached.thread-5] c.h.i.cluster.impl.ClusterJoinManager : [10.41.31.101]:5701 [light-cluster] [4.2.1] Checking if we should merge to: SplitBrainJoinMessage{packetVersion=4, buildNumber=20210630, memberVersion=4.2.1, clusterVersion=4.2, address=[10.41.31.102]:5701, uuid='9cdd64b4-62c8-4f19-bf29-d3cef4e8e2f6', liteMember=false, memberCount=1, dataMemberCount=1, memberListVersion=1}
2021-07-20 09:14:08.449 INFO 142 --- [hz.hazelcast-instance.cached.thread-5] c.h.i.cluster.impl.ClusterJoinManager : [10.41.31.101]:5701 [light-cluster] [4.2.1] [10.41.31.102]:5701 should merge to us, both have the same data member count: 1
2021-07-20 09:14:23.277 DEBUG 142 --- [hz.hazelcast-instance.cached.thread-4] c.h.i.p.InternalPartitionService : [10.41.31.101]:5701 [light-cluster] [4.2.1] Checking partition state, stamp: -5900145379368197006
第二个节点的日志:
2021-07-20 09:14:24.149 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-4] c.h.i.p.InternalPartitionService : [10.41.31.102]:5701 [light-cluster] [4.2.1] Checking partition state, stamp: -8661523421455686299
2021-07-20 09:14:24.175 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-4] c.h.s.d.integration.DiscoveryService : [10.41.31.102]:5701 [light-cluster] [4.2.1] Using service name to discover nodes.
2021-07-20 09:14:24.176 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-6] c.h.i.cluster.impl.MembershipManager : [10.41.31.102]:5701 [light-cluster] [4.2.1] Sending member list to the non-master nodes:
Members {size:1, ver:1} [
Member [10.41.31.102]:5701 - 9cdd64b4-62c8-4f19-bf29-d3cef4e8e2f6 this
]
2021-07-20 09:14:39.149 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-4] c.h.i.p.InternalPartitionService : [10.41.31.102]:5701 [light-cluster] [4.2.1] Checking partition state, stamp: -8661523421455686299
2021-07-20 09:14:54.148 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-6] c.h.i.p.InternalPartitionService : [10.41.31.102]:5701 [light-cluster] [4.2.1] Checking partition state, stamp: -8661523421455686299
2021-07-20 09:15:08.423 DEBUG 141 --- [hz.hazelcast-instance.priority-generic-operation.thread-0] c.h.i.cluster.impl.ClusterJoinManager : [10.41.31.102]:5701 [light-cluster] [4.2.1] Checking if we should merge to: SplitBrainJoinMessage{packetVersion=4, buildNumber=20210630, memberVersion=4.2.1, clusterVersion=4.2, address=[10.41.31.101]:5701, uuid='7263bccd-f330-4b96-8b52-f22db7c7a90e', liteMember=false, memberCount=1, dataMemberCount=1, memberListVersion=5}
2021-07-20 09:15:08.423 INFO 141 --- [hz.hazelcast-instance.priority-generic-operation.thread-0] c.h.i.cluster.impl.ClusterJoinManager : [10.41.31.102]:5701 [light-cluster] [4.2.1] We should merge to [10.41.31.101]:5701, both have the same data member count: 1
2021-07-20 09:15:08.424 DEBUG 141 --- [hz.hazelcast-instance.priority-generic-operation.thread-0] c.h.i.c.i.o.SplitBrainMergeValidationOp : [10.41.31.102]:5701 [light-cluster] [4.2.1] Returning SplitBrainJoinMessage{packetVersion=4, buildNumber=20210630, memberVersion=4.2.1, clusterVersion=4.2, address=[10.41.31.102]:5701, uuid='9cdd64b4-62c8-4f19-bf29-d3cef4e8e2f6', liteMember=false, memberCount=1, dataMemberCount=1, memberListVersion=1} to [10.41.31.101]:5701
2021-07-20 09:15:09.148 DEBUG 141 --- [hz.hazelcast-instance.cached.thread-6] c.h.i.p.InternalPartitionService : [10.41.31.102]:5701 [light-cluster] [4.2.1] Checking partition state, stamp: -8661523421455686299```