解决方案是结合节点亲和性使用污点和容忍度。我们创建了第二个节点池,并向抢占式池添加了一个污点。
地形配置:
resource "google_container_node_pool" "preemptible_worker_pool" {
node_config {
...
preemptible = true
labels {
preemptible = "true"
dedicated = "preemptible-worker-pool"
}
taint {
key = "dedicated"
value = "preemptible-worker-pool"
effect = "NO_SCHEDULE"
}
}
}
然后,我们使用toleration
andnodeAffinity
允许我们现有的工作负载在受污染的节点池上运行,从而有效地强制集群关键 pod 在未受污染的(非抢占式)节点池上运行。
Kubernetes 配置:
spec:
template:
spec:
# The affinity + tolerations sections together allow and enforce that the workers are
# run on dedicated nodes tainted with "dedicated=preemptible-worker-pool:NoSchedule".
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- preemptible-worker-pool
tolerations:
- key: dedicated
operator: "Equal"
value: preemptible-worker-pool
effect: "NoSchedule"