我有 pgpool 的这个配置:“Host-1”master 和“Host-2”slave,如果“Host-1”宕机,pgpool 正确地将“Host-2”提升为 master;但是如果“Host-1”返回,pgpool 不知道这一点,如果“Host-2”关闭,即使“Host-1”是,pgpool 也不会将“Host-1”提升为主在线的。我启用了 health_check 但它似乎完全没用,因为“Host-1”的状态(在它启动之后)总是 3="Node is down"。
这是事件期间命令“show pool_nodes”的输出:
-> 初始情况:"Host-1" UP (master), "Host-2" UP (slave)
node_id | hostname | port | status | lb_weight | role
---------+----------+------+--------+-----------+--------
0 | Host-1 | 5432 | 2 | nan | master
1 | Host-2 | 5432 | 1 | nan | slave
-> 节点 0 关闭:“Host-1” DOWN,“Host-2” UP
node_id | hostname | port | status | lb_weight | role
---------+----------+------+--------+-----------+--------
0 | Host-1 | 5432 | 3 | nan | slave
1 | Host-2 | 5432 | 2 | nan | master
-> 节点 0 返回:“Host-1” UP,“Host-2” UP
node_id | hostname | port | status | lb_weight | role
---------+----------+------+--------+-----------+--------
0 | Host-1 | 5432 | 3 | nan | slave
1 | Host-2 | 5432 | 2 | nan | master
请注意,“Host-1”的状态是 3,这意味着“节点已关闭”
-> 节点 1 宕机:“Host-1”UP,“Host-2”DOWN:此时我无法连接到 db,即使节点 0 已启动并正在运行!
我必须做些什么才能允许 pgpool 再次提升节点 0 的主节点?如果有用,这些是我的 pgpool.conf 的“后端连接设置”和“健康检查”部分:
# - Backend Connection Settings -
backend_hostname0 = 'Host-1'
# Host name or IP address to connect to for backend 0
backend_port0 = 5432
# Port number for backend 0
#backend_weight0 = 1
# Weight for backend 0 (only in load balancing mode)
#backend_data_directory0 = '/data'
# Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
# Controls various backend behavior
# ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER
backend_hostname1 = 'Host-2'
# Host name or IP address to connect to for backend 0
backend_port1 = 5432
# Port number for backend 0
#backend_weight1 = 1
# Weight for backend 0 (only in load balancing mode)
#backend_data_directory1 = '/data'
# Data directory for backend 0
backend_flag1 = 'ALLOW_TO_FAILOVER'
# Controls various backend behavior
# ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER
#------------------------------------------------------------------------------
# HEALTH CHECK
#------------------------------------------------------------------------------
health_check_period = 10
# Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'admin'
# Health check user
health_check_password = '12345'
# Password for health check user
health_check_max_retries = 10
# Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
# Amount of time to wait (in seconds) between retries.