0

我在赞助人集群中有 2 个成员(1 个主节点和 1 个副本)。在主重新连接到新的 etcd 服务器后,我在日志中看到了问题:

ERROR: Request to server http://etcd2:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'etcd2\', port=2379): Max retries exceeded with url: /v2/keys/patroni/patroni-cluster/?recursive=true (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'etcd2\', port=2379): Read timed out. (read timeout=3.333078201239308)"))')

INFO: Reconnection allowed, looking for another server.

INFO: Retrying on http://etcd1:2379

INFO: Selected new etcd server http://etcd1:2379

INFO: Lock owner: patroni2; I am patroni1

INFO: does not have lock

INFO: Reaped pid=3098484, exit status=0

LOG:  received immediate shutdown request

WARNING:  terminating connection because of crash of another server process

DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.

在此副本节点成为主节点后:

INFO: Got response from patroni1 http://0.0.0.0:8008/patroni: {"state": "running", "postmaster_start_time": "2021-08-09 14:43:18.372 UTC", "role": "replica", "server_version": 120003, "cluster_unlocked": true, "xlog": {"received_location": 139045264096, "replayed_location": 139045264096, "replayed_timestamp": "2021-09-27 15:03:10.389 UTC", "paused": false}, "timeline": 30, "database_system_identifier": "6904244251638517787", "patroni": {"version": "1.6.5", "scope": "patroni-cluster"}}

WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"

INFO: promoted self to leader by acquiring session lock

server promoting

LOG:  received promote request

INFO: Lock owner: patroni2; I am patroni2

INFO: no action.  i am the leader with the lock

ERROR:  replication slot "patroni1" does not exist

ERROR:  replication slot "patroni1" does not exist
 
INFO:   acquired session lock as a leader

正如你在上面看到的,新主人现在看不到赞助人1。经过几次恢复 wal 赞助人 1 在下面写了这些日志:

INFO: establishing a new patroni connection to the postgres cluster

INFO: My wal position exceeds maximum replication lag

INFO: following a different leader because i am not the healthiest node

INFO: My wal position exceeds maximum replication lag

这些日志信息此时不会更改。赞助人2 写acquired session lock as a leader和赞助人1 写my wal position exceeds maximum replication lagpatronictl -c /patroni.yml list但是当我使用命令时,我在赞助人集群中看不到它们。

我应该如何以更好的方式将它们带回集群?

4

0 回答 0