0

Suppose I have a 3 node EKS cluster made up of 3 spot instances (we'll call them Node A, B, and C), and each node has critical pods scheduled. The EKS cluster has the EKS Node Termination Handler running. Metadata gets posted saying that in 2 minutes Node A is going to be reclaimed by Amazon.

The Node Termination handler cordons and drains the node being taken (Node A), and a new node spins up. The pods from Node A are then scheduled on the Node A Replacement. If this completes in two minutes time, perfect.

Is there a benefit to having spare capacity around (Node D). If Node A is taken back by Amazon, will my pods be rescheduled on Node D since it is already available?

In this architecture, it seems like a great idea to have a spare node or two around for pod rescheduling so I don't have a risk of the 2 minute window. Do I need to do anything special to make sure the pods are rescheduled in the most efficient way?

4

1 回答 1

1

在(节点 D)周围拥有备用容量是否有好处。如果节点 A 被亚马逊收回,我的 pod 是否会重新安排在节点 D 上,因为它已经可用了?

是的,如果没有附加到部署的任何特定参数(如Node selectortaintaffinity等),POD 很有可能会被安排在该节点上。

我是否需要做任何特别的事情来确保以最有效的方式重新安排 pod?

这听起来不错,但是如果同时3 个POD 都收到终止信号,那么在 2 分钟内所有 POD 都可以重新调度到新节点上怎么办?

新的3个节点将可用还是单 D节点可用?

您可能需要注意根据节点数量安排的所有 POD 的大小、使用适当的快速配置的就绪状态,以便 POD 尽快启动并处理流量。

如果您的Single D节点正在运行,并且所有3 个Spot 实例都被终止,这可能会产生问题,那么您将运行的Nginx 入口服务网格的 POD 怎么样?

如果 Nginx POD 正在被安排,如果它们是Rollingupdate有时可能需要几秒钟的时间,那很好。

于 2022-01-11T19:05:48.173 回答