自从升级到 Postgres 11 后,我的生产备用服务器无法赶上。在日志中,事情最终看起来很好:
2019-02-06 19:23:53.659 UTC [14021] LOG: consistent recovery state reached at 3C772/8912C508
2019-02-06 19:23:53.660 UTC [13820] LOG: database system is ready to accept read only connections
2019-02-06 19:23:53.680 UTC [24261] LOG: started streaming WAL from primary at 3C772/8A000000 on timeline 1
但是以下查询表明一切都不好:
warehouse=# SELECT coalesce(abs(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())), -1) / 1024 / 1024 / 1024 AS replication_delay_gbytes;
replication_delay_gbytes
-------------------------
208.2317776754498486
(1 row)
warehouse=# select now() - pg_last_xact_replay_timestamp() AS replication_delay;
replication_delay
-------------------
01:54:19.150381
(1 row)
一段时间后(几个小时)replication_delay
保持不变但replication_delay_gbytes
会增长,尽管音符replication_delay
从一开始就落后并replication_delay_gbytes
开始于 附近0
。在启动期间,有许多这样的消息:
2019-02-06 18:24:36.867 UTC [14036] WARNING: xlog min recovery request 3C734/FA802AA8 is past current point 3C700/371ED080
2019-02-06 18:24:36.867 UTC [14036] CONTEXT: writing block 0 of relation base/16436/2106308310_vm
但谷歌搜索表明这些都很好。
副本是使用 repmgr 创建的,方法是运行pg_basebackup
以执行克隆,然后启动副本并看到它赶上。这以前与 Postgres 10 一起使用。
关于为什么这个复制品出现但永远落后的任何想法?