因此,我正在使用一些本地虚拟机测试一些玩具 postgresql 基础架构,以确定 pgpool 在故障转移时的行为。我已经配置了一个基本设置,其中我有两台数据库机器(192.168.0.2 和 192.168.0.3)和一台 pgpool 机器(192.168.0.4)。192.168.0.3 已使用流复制设置为 192.168.0.2 的从属设备。pgpool-ii 已使用以下配置:
listen_addresses = '*'
backend_hostname0 = '192.168.0.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.4/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.0.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.4/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
replication_mode = false
master_slave_mode = on
master_slave_sub_mode = 'stream'
fail_over_on_backend_error = true
failover_command = '/root/pgpool_failover_stream.sh %d %H /tmp/postgresql.trigger.5432'
load_balance_mode = false
我已经确认这一切正常。也就是说,当我更改主数据库时,复制正在工作,我可以使用示例应用程序连接到主、从和 pgpool-ii 并获得我期望的结果。
现在,我已经启动了一个连接到 pgpool 的长时间运行的应用程序,然后尝试通过 SSH 连接到主数据库服务器并强制结束 postgres 任务(service postgresql stop
以 root 身份)来导致故障转移。我的应用程序继续正确执行查询,但没有发生故障转移(脚本尚未运行)。我什至测试过直接连接到主数据库,当我停止 postgres 服务时,我确实最终导致应用程序崩溃。
难道我做错了什么?我没有正确配置我的 pgpool 吗?还是有更好的方法来触发故障转移?
编辑:根据要求,这是发生第一个错误的日志部分:
...
2016-03-15 18:47:15: pid 1232: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1231: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1230: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: LOG: find_primary_node: checking backend no 1
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: DEBUG: find_primary_node: no primary node found
...
奇怪的是,我仍然可以连接到 pgpool 并执行查询,所以很明显我不明白那里的东西。
编辑2:这些是我service postgresql shutdown
在主人身上得到的错误。我展示了一切,直到 pgpool 开始关闭。
...
2016-03-16 17:24:57: pid 1012: DEBUG: session context: clearing doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: reading backend data packet kind
2016-03-16 17:24:57: pid 1012: DETAIL: backend:0 of 2 kind = 'E'
2016-03-16 17:24:57: pid 1012: DEBUG: processing backend response
2016-03-16 17:24:57: pid 1012: DETAIL: received kind 'E'(45) from backend
2016-03-16 17:24:57: pid 1012: ERROR: unable to forward message to frontend
2016-03-16 17:24:57: pid 1012: DETAIL: FATAL error occured on backend
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: decide where to send the queries
2016-03-16 17:24:57: pid 1012: DETAIL: destination = 3 for query= "DISCARD ALL"
2016-03-16 17:24:57: pid 1012: DEBUG: waiting for query response
2016-03-16 17:24:57: pid 1012: DETAIL: waiting for backend:0 to complete the query
2016-03-16 17:24:57: pid 1012: FATAL: unable to read data from DB node 0
2016-03-16 17:24:57: pid 1012: DETAIL: EOF encountered with backend
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler
2016-03-16 17:24:57: pid 998: LOG: child process with pid: 1012 exits with status 256
2016-03-16 17:24:57: pid 998: LOG: fork a new child process with pid: 1033
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler: exiting normally
2016-03-16 17:24:57: pid 1033: DEBUG: initializing backend status
2016-03-16 17:25:02: pid 1031: DEBUG: PCP child receives shutdown request signal 2
2016-03-16 17:25:02: pid 1029: LOG: child process received shutdown request signal 2
...
请注意,当主服务器关闭时,我的示例应用程序实际上确实死了。
编辑 3:我在新日志中遇到的错误,在正确设置后sr_check_period
,sr_check_user
所有sr_check_password
以前的错误现在都消失了:
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: 1
2016-03-31 17:45:00: pid 18363: DEBUG: reading backend data packet kind
2016-03-31 17:45:00: pid 18363: DETAIL: backend:0 of 2 kind = '1'
...
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: S