我在赞助人集群中有 2 个成员(1 个主节点和 1 个副本)。在主重新连接到新的 etcd 服务器后,我在日志中看到了问题:
ERROR: Request to server http://etcd2:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'etcd2\', port=2379): Max retries exceeded with url: /v2/keys/patroni/patroni-cluster/?recursive=true (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'etcd2\', port=2379): Read timed out. (read timeout=3.333078201239308)"))')
INFO: Reconnection allowed, looking for another server.
INFO: Retrying on http://etcd1:2379
INFO: Selected new etcd server http://etcd1:2379
INFO: Lock owner: patroni2; I am patroni1
INFO: does not have lock
INFO: Reaped pid=3098484, exit status=0
LOG: received immediate shutdown request
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
在此副本节点成为主节点后:
INFO: Got response from patroni1 http://0.0.0.0:8008/patroni: {"state": "running", "postmaster_start_time": "2021-08-09 14:43:18.372 UTC", "role": "replica", "server_version": 120003, "cluster_unlocked": true, "xlog": {"received_location": 139045264096, "replayed_location": 139045264096, "replayed_timestamp": "2021-09-27 15:03:10.389 UTC", "paused": false}, "timeline": 30, "database_system_identifier": "6904244251638517787", "patroni": {"version": "1.6.5", "scope": "patroni-cluster"}}
WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
INFO: promoted self to leader by acquiring session lock
server promoting
LOG: received promote request
INFO: Lock owner: patroni2; I am patroni2
INFO: no action. i am the leader with the lock
ERROR: replication slot "patroni1" does not exist
ERROR: replication slot "patroni1" does not exist
INFO: acquired session lock as a leader
正如你在上面看到的,新主人现在看不到赞助人1。经过几次恢复 wal 赞助人 1 在下面写了这些日志:
INFO: establishing a new patroni connection to the postgres cluster
INFO: My wal position exceeds maximum replication lag
INFO: following a different leader because i am not the healthiest node
INFO: My wal position exceeds maximum replication lag
这些日志信息此时不会更改。赞助人2 写acquired session lock as a leader
和赞助人1 写my wal position exceeds maximum replication lag
。patronictl -c /patroni.yml list
但是当我使用命令时,我在赞助人集群中看不到它们。
我应该如何以更好的方式将它们带回集群?