1

我有带有 3 个节点“VM”的 K8s 集群,在通过 kubespray“基于 kubeadm 的工具”安装的所有 3 个节点“未污染的主节点”上安装了 Etcd 进行主/工作

现在我想用另一个替换一个虚拟机。是否有直接的方法可以这样做,因为唯一的解决方法是通过 kubespray scale.yml ex:node4 和节点 5 添加 2 个节点以始终拥有奇数个 ETCD,然后删除额外的节点 3 和节点 5 保留节点 4

我不喜欢这种方法。

欢迎任何想法

此致

4

1 回答 1

2

If you have 3 main (please avoid using master, ) control plane nodes you should be fine replacing 1 at a time. The only thing is that your cluster will not be able to make any decisions/schedule any new workload, but the existing workloads will run fine.

The recommendation of 5 main nodes is based on the fact that you will always have a majority to reach quorum on the state decisions for etcd even if one node goes down. So if you have 5 nodes and one of them goes down you will still be able to schedule/run workloads.

In other words:

  • 3 main nodes

    • Can tolerate a failure of one node.
    • Will not be able to make decisions
  • 5 main nodes

    • Can tolerate a failure of one node.
    • Can still make decisions because there are 4 nodes still available.
    • If 2 failures happen then it's tolerated but there is no quorum.

To summarize, Raft which is the consensus protocol for etcd tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1. The recommended procedure is to update the nodes one at a time: bring one node down, and then a bring another one up, and wait for it to join the cluster (all control plane components)

于 2020-07-15T20:07:28.303 回答