2

如果我们正在运行 multi-paxos,那么一个节点可能会看到:

Propose(N)
Accept!(N,Vn)
Accept!(N+1,Vm)
Accept!(N+4,Vo) // huh? where is +2, +3?
Accept!(N+5,Vp)

这可能是因为:

  • 有一个稳定的领导者,但该节点的本地网络下降,否则延迟 +2 和 +3。
  • 有一次中断,因此有两次尝试提议,使得 +2 和 +3 是失败的轮次提议

一般来说,分布式有限状态机上的操作不会通勤,因此节点应该按顺序应用所有操作。这意味着节点需要能够区分这两种情况。如果提案轮次失败,则节点没有问题。如果消息丢失,则建议节点应该等到它们出现,否则尝试恢复丢失的数据(例如,请求快照以重新初始化和追赶)。

处理此问题的选项或策略是什么,它们会产生什么开销?

这个问题受到In Paxos 的启发,Acceptor 是否可以在接受了一个值之后再接受一个不同的值?

4

1 回答 1

1

I can think of two methods to deal with this.

The simplest approach would be to have the node that is missing +2 and +3 to go back and try to propose no-ops in those slots. If there were decisions there, the node will learn the data in the prepare round. Otherwise, no-ops will be decided.

Another approach would be to have an out-of-band re-learning process. This may be necessary anyway: how does a node catch up if it joins the system after the others?

Or you can use a combination of both. The leader can propose no-ops for any holes in its history, the others can use the re-learning process. This is how my paxos system works.

于 2014-10-23T03:19:40.723 回答