4

I read how current master election algorithms like Raft, Paxos or Zab elect master on a cluster and couldn't understand why they use sophisticated algorithms instead of simple bully algorithm.

I'm developing a cluster library and use UDP Multicast for heartbeat messages. Each node joins a multicast address and also send datagram packets periodically to that address. If the nodes find out there is a new node that sends packets to this multicast address, the node is simply added to cluster and similarly when the nodes in the cluster don't get any package from a node, they remove it from the cluster. When I need to choose a master node, I simply iterate over the nodes in the cluster and choose the oldest one.

I read some articles that implies this approach is not effective and more sophisticated algorithms like Paxos should be used in order to elect a master or detect failures via heartbeat messages. I couldn't understand why Paxos is better for split-brain scenarios or other network failures than traditional bully algorithm because I can easily find out when quorum of nodes leave from the cluster without using Raft. The only benefit I see is the count of packets that each server have to handle; only master sends heartbeat messages in Raft while in this case each node has to send heartbeat message to each other. However I don't think this is a problem since I can simply implement similar heartbeat algorithm without changing my master election algorithm.

Can someone elaborate on that?

4

1 回答 1

7

从理论上讲,Raft、Paxos 和 Zab 都不是领导人选举算法。他们解决了一个不同的问题,称为共识。

在您的具体情况下,区别如下。使用leader选举算法,你只能保证最终有一个节点是leader。这意味着在一段时间内,多个节点可能认为他们是领导者,因此可能表现得像一个。相比之下,通过上面的共识算法,你可以保证在一个逻辑时刻最多有一个领导者。

结果是这样的。如果系统的安全性取决于单个领导者的存在,那么仅依靠领导者选举可能会遇到麻烦。考虑一个例子。节点从 UDP 多播接收消息,如果发送者是领导者,则执行 A,但如果发送者不是领导者,则执行 B。如果两个节点在稍微不同的时间点检查集群中最旧的节点,它们可能会看到不同的领导者。然后这两个节点可能会接收到多播消息并以不同的方式对其进行处理,这可能会违反您想要持有的系统的某些安全属性(例如,所有节点要么执行 A 要么执行 B,但从来没有一个执行 A 而另一个执行乙)。

使用 Raft、Paxos 和 Zab,您可以克服这个问题,因为这些算法创建了某种逻辑时期,每个时期最多有一个领导者。

这里有两个注意事项。首先,为同步系统定义了欺凌算法。如果你真的按照 Garcia-Molina 的论文中描述的那样实现它,我相信你可能会在部分同步系统中遇到问题。其次,Zab 算法依赖于一种用于异步系统的欺凌算法。领导者是通过比较他们的历史长度来选出的(这可以最大限度地减少系统的恢复时间)。Once the leader is elected, it tries to start the Zab protocol, which in turn locks the epoch for the leader.

于 2014-12-20T12:45:17.747 回答