paxos - 如何使用 Paxos 处理跳过的事件编号？

Question

如果我们正在运行 multi-paxos，那么一个节点可能会看到：

Propose(N)
Accept!(N,Vn)
Accept!(N+1,Vm)
Accept!(N+4,Vo) // huh? where is +2, +3?
Accept!(N+5,Vp)

这可能是因为：

有一个稳定的领导者，但该节点的本地网络下降，否则延迟 +2 和 +3。
有一次中断，因此有两次尝试提议，使得 +2 和 +3 是失败的轮次提议

一般来说，分布式有限状态机上的操作不会通勤，因此节点应该按顺序应用所有操作。这意味着节点需要能够区分这两种情况。如果提案轮次失败，则节点没有问题。如果消息丢失，则建议节点应该等到它们出现，否则尝试恢复丢失的数据（例如，请求快照以重新初始化和追赶）。

处理此问题的选项或策略是什么，它们会产生什么开销？

这个问题受到In Paxos 的启发，Acceptor 是否可以在接受了一个值之后再接受一个不同的值？

score 1 · Accepted Answer

I can think of two methods to deal with this.

The simplest approach would be to have the node that is missing +2 and +3 to go back and try to propose no-ops in those slots. If there were decisions there, the node will learn the data in the prepare round. Otherwise, no-ops will be decided.

Another approach would be to have an out-of-band re-learning process. This may be necessary anyway: how does a node catch up if it joins the system after the others?

Or you can use a combination of both. The leader can propose no-ops for any holes in its history, the others can use the re-learning process. This is how my paxos system works.

paxos - 如何使用 Paxos 处理跳过的事件编号？

1 回答 1

Related

Reference