1

我们在使用 Corosync 和 Pacemaker 为 HA 配置 PostgreSQL 时遇到问题。

crm_mon 输出为

Last updated: Thu Dec 18 10:24:04 2014
Last change: Thu Dec 18 10:16:30 2014 via crmd on umhtvappdpj05.arqiva.local 
Stack: corosync 
Current DC: umhtvappdpj06.arqiva.local (1) - partition with quorum 
Version: 1.1.10-29.el7-368c726 
2 Nodes configured 
4 Resources configured 


Online: [ umhtvappdpj05.arqiva.local umhtvappdpj06.arqiva.local ] 

Full list of resources: 

 Master/Slave Set: msPostgresql [pgsql] 
     Masters: [ umhtvappdpj06.arqiva.local ] 
     Slaves: [ umhtvappdpj05.arqiva.local ] 
 Resource Group: master-group 
     vip-master (ocf::heartbeat:IPaddr2):       Started umhtvappdpj06.arqiva.local 
     vip-rep    (ocf::heartbeat:IPaddr2):       Started umhtvappdpj06.arqiva.local 

Node Attributes: 
* Node umhtvappdpj05.arqiva.local: 
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : LATEST 
    + pgsql-status                      : HS:alone 
    + pgsql-xlog-loc                    : 0000000097000168 
* Node umhtvappdpj06.arqiva.local: 
    + master-pgsql                      : 1000 
    + pgsql-data-status                 : LATEST 
    + pgsql-master-baseline             : 0000000094000090 
    + pgsql-status                      : PRI 

Migration summary: 
* Node umhtvappdpj05.arqiva.local: 
* Node umhtvappdpj06.arqiva.local:`

此处节点 06(umhtvappdpj06.arqiva.local) 启动为主节点,节点 05(umhtvappdpj05.arqiva.local) 充当备用节点,但两者均未连接。

recovery.conf on node 05
standby_mode = 'on' 
primary_conninfo = 'host=10.52.6.95 port=5432 user=postgres application_name=umhtvappdpj05.arqiva.local keepalives_idle=60 keepalives_interval=5 keepalives_count=5' 
restore_command = 'scp 10.52.6.85:/var/lib/pgsql/pg_archive/%f %p' 
recovery_target_timeline = 'latest'` 

创建的资源是:

pcs resource create vip-master IPaddr2 \ 
ip="10.52.6.94" \ 
nic="ens192" \ 
cidr_netmask="24" \ 
op start   timeout="60s" interval="0s"  on-fail="restart" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op stop    timeout="60s" interval="0s"  on-fail="block" 

pcs resource create vip-rep IPaddr2 \ 
ip="10.52.6.95" \ 
nic="ens192" \ 
cidr_netmask="24" \ 
meta migration-threshold="0" \ 
op start   timeout="60s" interval="0s"  on-fail="stop" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op stop    timeout="60s" interval="0s"  on-fail="ignore" 

pcs resource create pgsql ocf:heartbeat:pgsql \ 
pgctl="/usr/pgsql-9.3/bin/pg_ctl" \ 
psql="/usr/pgsql-9.3/bin/psql" \ 
pgdata="/pgdata/data" \ 
rep_mode="sync" \ 
node_list="10.52.6.85 10.52.6.92" \ 
restore_command="scp 10.52.6.85:/var/lib/pgsql/pg_archive/%f %p" \ 
master_ip="10.52.6.95" \ 
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ 
restart_on_promote='true' \ 
op start   timeout="60s" interval="0s"  on-fail="restart" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op monitor timeout="60s" interval="9s"  on-fail="restart" role="Master" \ 
op promote timeout="60s" interval="0s"  on-fail="restart" \ 
op demote  timeout="60s" interval="0s"  on-fail="stop" \ 
op stop    timeout="60s" interval="0s"  on-fail="block" \ 
op notify  timeout="60s" interval="0s" 

[root@umhtvappdpj05 data]# pcs resource show --all 
 Master/Slave Set: msPostgresql [pgsql] 
     Masters: [ umhtvappdpj06.arqiva.local ] 
     Slaves: [ umhtvappdpj05.arqiva.local ] 
 Resource Group: master-group 
     vip-master (ocf::heartbeat:IPaddr2):       Started 
     vip-rep    (ocf::heartbeat:IPaddr2):       Started 
[root@umhtvappdpj05 data]# `

唯一的异常是 corosync 和起搏器首先安装在节点 6 上,而节点 6 与节点 5 位于不同的子网中。随后节点 6 被转移到与 5 相同的子网。这可能是原因吗?也许在节点 6 上重新安装。似乎有意义。

谢谢你

萨米尔

4

1 回答 1

0

以下是如何将您的副本与主 PostgreSQL 连接。

1. Touch PGSQL.lock file  in /var/lib/pgsql/tmp/ in  umhtvappdpj05.arqiva.local
2. Stop PostgreSQL in node umhtvappdpj05.arqiva.local using systemctl
3. Do base backup/rsync of data dir/  from primary server to replica
4. Remove PGSQL.lock file from replica
5. Run pcs resource cleanup pgsql

这些步骤总是对我有用。如果没有,请告诉我。

于 2020-02-07T06:58:34.683 回答