我目前正在尝试使用 3 个 EC2 实例在 AWS 上实施 HA 故障转移。假设这 3 台机器的名称是 HA1、HA2 和 HA3。HA1 有弹性 IP,另外两个有标准的公共 IP 来建立 SSH 连接。我已经在下面的列表中关注了这三个资源:
- https://medium.com/@2infiniti/creating-highly-available-nodes-on-icon-stage-1-active-passive-failover-with-pacemaker-and-a9d56b1484da
- https://medium.com/@gt.anand1994/ha-cluster-with-elasticip-using-corosync-and-pacemaker-a013d288ae8
- https://www.howtoforge.com/tutorial/how-to-set-up-nginx-high-availability-with-pacemaker-corosync-and-crmsh-on-ubuntu-1604/#step-configure-corosync
在我这样做之前完全没有问题,crm status
因为我可以在 shell 上看到以下输出:
Current DC: PRep-01 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 16 15:01:40 2019
Last change: Mon Dec 16 15:01:31 2019 by root via cibadmin on PRep-01
3 nodes configured
1 resource configured
Online: [ PRep-01 PRep-02 PRep-03 ]
Full list of resources:
deneme123 (ocf::heartbeat:awseip): Stopped
如您所见,主要问题是我使用以下命令创建的资源无法启动。
sudo crm configure primitive deneme123 ocf:heartbeat:awseip params elastic_ip="xx.xx.xx.xx" awscli="$(which aws)" allocation_id="eipalloc-xxxxxxxxxx" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block" meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
最后,当我检查所有三个实例的起搏器状态时,我得到以下信息:
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: Result of probe operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_monitor_0:5 [ You must specify a region. Yo
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: Result of start operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_start_0:6 [ You must specify a region. You
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 crmd[30721]: notice: Result of stop operation for deneme123 on PRep-02: 0 (ok)
但我已经这样做aws configure
并进入了该区域,也可以在 ~/.aws/config 上看到该区域。同时,我也添加AWS_DEFAULT_REGION=eu-xx-1
到/etc/systemd/system/multi-user.target.wants/pacemaker.service
文件中。
问题是这里有什么问题?我看不出 AWS 区域有什么问题。是什么原因造成的?