我在部署在具有 3 个节点的 AWS EKS 上的 K8s 基础设施中遇到了 OpenEBS 问题。我正在部署带有一个副本的 RabbitMQ 状态集。当节点关闭并且 pod 在其他节点上重新启动时,我想保留 RabbitMQ pod 数据。因此,我在集群中部署了 OpenEBS。我试图终止运行 pod 的节点,因此 pod 尝试在其他节点中重新启动。但是 pod 没有在其他节点中启动并保持在ContainerCreating
状态,并向我展示了以下问题 -
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m28s default-scheduler Successfully assigned rabbitmq/rabbitmq-0 to ip-10-0-1-132.ap-south-1.compute.internal
Warning FailedAttachVolume 2m28s attachdetach-controller Multi-Attach error for volume "pvc-b62d32f1-de60-499a-94f8-3c4d1625353d" Volume is already exclusively attached to one node and can't be attached to another
Warning FailedMount 2m26s kubelet MountVolume.SetUp failed for volume "rabbitmq-token-m99tw" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 25s kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[configuration data rabbitmq-token-m99tw]: timed out waiting for the condition
然后过了一段时间(大约 5-10 分钟),rabbitmq pod 能够启动,但我观察到一个 cstor-disk-pool pod 失败并出现以下错误 -
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m7s (x3 over 7m9s) default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector.
Warning FailedScheduling 44s (x8 over 6m14s) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
我描述了那个 cstor-disk-pool pod,并且 Node-Selectors 键仍然具有旧节点的值(已终止)有人可以帮我解决这个问题吗?此外,我们需要一种方法来减少 rabbitmq pod 重新启动和正确准备的时间,因为我们无法为我们的应用程序提供 5-10 分钟的 rabbitmq 服务停机时间