cluster-computing - pcs 在同时启动两台机器时在主节点中启动它们之前不会停止伙伴节点中的故障转移资源

Question

我最近开始研究集群，如果您想了解更多信息，请告诉我。

我有一个主动-主动 HA 集群。它旨在在故障转移情况下工作。

我有 Node1 和 Node2 作为主动-主动集群。起搏器和 corosync 用作集群管理器。两个节点都有 1 个资源组，每个资源组有 3 个资源。

当 Node1 出现故障时，Node2 会按预期接管其资源。当 Node1 重新上线时，pcs 首先在 node2 中停止 node1 资源，然后在 node1 中启动它们，这也是预期的并且工作正常。

问题：当两个节点同时启动时，我面临问题。

场景：两个节点同时下电再上电时。假设Node2首先启动，然后PCS看到node1仍然离线（仍在启动）并在node2中启动node1资源。然后它也在node2中启动自己的资源

所以在 node1 完全启动的同时，它会启动自己的资源。这里的问题是在它开始之前它没有停止 node2 中当前启动的 node1 资源（故障转移）。

因此，在结束时，node1 的资源在 node1 中启动，node2 的资源也在 node2 中启动。

当它们以时差（15 分钟）启动时，上述情况永远不会发生。当只有一个节点重新启动或关闭时，它也可以正常工作。

            # pcs property list --all
            Cluster Properties:
            batch-limit: 0
            cluster-delay: 60s
            cluster-infrastructure: cman
            cluster-recheck-interval: 15min
            crmd-finalization-timeout: 30min
            crmd-integration-timeout: 3min
            crmd-transition-delay: 0s
            dc-deadtime: 20s
            dc-version: 1.1.11-97629de
            default-action-timeout: 20s
            default-resource-stickiness: 0
            election-timeout: 2min
            enable-startup-probes: true
            expected-quorum-votes: 2
            is-managed-default: true
            last-lrm-refresh: 1565098302
            load-threshold: 80%
            maintenance-mode: false
            migration-limit: -1
            no-quorum-policy: ignore
            node-action-limit: 0
            node-health-green: 0
            node-health-red: -INFINITY
            node-health-strategy: none
            node-health-yellow: 0
            pe-error-series-max: -1
            pe-input-series-max: 4000
            pe-warn-series-max: 5000
            placement-strategy: default
            remove-after-stop: false
            shutdown-escalation: 20min
            start-failure-is-fatal: true
            startup-fencing: true
            stonith-action: reboot
            stonith-enabled: false
            stonith-timeout: 60s
            stop-all-resources: false
            stop-orphan-actions: true
            stop-orphan-resources: true
            symmetric-cluster: false

score 0 · Accepted Answer

我可以通过使用 pcs 0.9.155 版本来解决这个问题。当同时重启发生时，较旧的 pcs 版本有此错误。

cluster-computing - pcs 在同时启动两台机器时在主节点中启动它们之前不会停止伙伴节点中的故障转移资源

1 回答 1

Related

Reference