3

我正在使用redis-sentinel2.8 RC5 的 redis 来监视和故障转移我的 redis 节点。设置如下:

哨兵节点

  • 服务器 A
  • 服务器 B
  • 服务器 C

Redis 节点

  • 服务器 A(主)
  • 服务器 B(服务器 A 的从属服务器——slaveof在 redis 配置中使用)

在每个哨兵节点上,我使用相同的配置:

sentinel monitor serverA serverA.mydomain.tld 6379 2
sentinel auth-pass serverA "MYAUTHPASS"
sentinel down-after-milliseconds serverA 10000
sentinel failover-timeout serverA 20000
sentinel can-failover serverA yes
sentinel parallel-syncs serverA 1

sentinel monitor serverB serverB.mydomain.tld 6379 4
sentinel auth-pass serverB "MYAUTHPASS"
sentinel down-after-milliseconds serverB 10000
sentinel failover-timeout serverB 20000
sentinel can-failover serverB yes
sentinel parallel-syncs serverB 5

我正在做这个测试运行:

  1. 启动两个 redis 节点
  2. 启动每个哨兵节点(全部三个)
  3. 停止服务器 A(主服务器)
  4. 服务器 B 成为主服务器
  5. 再次启动服务器 A(主服务器)
  6. 停止服务器 B(从)
  7. 再次启动 Server B (Slave)

现在服务器 A 是服务器 B 的从服务器,服务器 A 是服务器 B 的从服务器。

这是一个错误还是配置错误?

这些是第 3 步(停止服务器 A 上的主服务器)之后的哨兵日志

[19569] 17 Sep 18:33:28.873 # +sdown master serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:28.873 # +sdown master serverB serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:29.073 # +odown master serverA serverA.mydomain.tld 6379 #quorum 3/2
[19569] 17 Sep 18:33:29.073 # +failover-triggered master serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:29.073 # +failover-state-wait-start master serverA serverA.mydomain.tld 6379 #starting in 6543 milliseconds
[19569] 17 Sep 18:33:35.700 # +failover-state-select-slave master serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:35.800 # +selected-slave slave serverB.mydomain.tld:6379 serverB.mydomain.tld 6379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:35.800 * +failover-state-send-slaveof-noone slave serverB.mydomain.tld:6379 serverB.mydomain.tld 6379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:35.900 * +failover-state-wait-promotion slave serverB.mydomain.tld:6379 serverB.mydomain.tld 6379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:36.204 # +promoted-slave slave serverB.mydomain.tld:6379 serverB.mydomain.tld 6379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:36.204 # +failover-state-reconf-slaves master serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:36.302 # +failover-end master serverA serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:36.302 # +switch-master serverA serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:37.196 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:38.107 # +failover-detected master serverB serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:38.206 # +failover-end master serverB serverA.mydomain.tld 6379
[19569] 17 Sep 18:33:38.206 # +switch-master serverB serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:41.322 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:41.322 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:42.105 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:46.329 # +sdown slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:33:48.233 # +sdown slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverB serverB.mydomain.tld 6379

的概述CONFIG GET slaveof

  • 服务器A:不在线
  • 服务器 B:""

这些是再次启动服务器 A(主服务器)后的日志(步骤 5):

[19569] 17 Sep 18:53:08.305 * +demote-old-slave slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:53:08.306 * +demote-old-slave slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 18:53:08.506 # -sdown slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:53:08.506 # -sdown slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 18:53:18.319 * +slave slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:53:18.319 * +slave slave serverA.mydomain.tld:6379 serverA.mydomain.tld 6379 @ serverB serverB.mydomain.tld 6379

的概述CONFIG GET slaveof

  • 服务器 A:"serverB.mydomain.tld"
  • 服务器 B:""

这些是停止服务器 B(从)后的日志:

[19569] 17 Sep 18:58:38.375 # +sdown master serverB serverB.mydomain.tld 6379
[19569] 17 Sep 18:58:38.675 # +sdown master serverA serverB.mydomain.tld 6379
[19569] 17 Sep 18:58:38.876 # +odown master serverA serverB.mydomain.tld 6379 #quorum 3/2

的概述CONFIG GET slaveof

  • 服务器A:("serverB.mydomain.tld"这个时候好像出错了!)
  • 服务器B:不在线

这些是再次启动服务器 B(从)后的日志:

[19569] 17 Sep 19:00:59.892 * +reboot master serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:00:59.892 # +redirect-to-master serverB serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:00:59.892 * +reboot master serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:00:59.892 # +redirect-to-master serverA serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:00.012 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:00.012 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:05.008 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:05.008 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:09.907 # +redirect-to-master serverA serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:09.907 # +redirect-to-master serverB serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:10.029 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:10.029 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:10.046 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:10.046 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:19.925 # +redirect-to-master serverA serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:19.925 # +redirect-to-master serverB serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:20.096 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:20.096 * +sentinel sentinel serverC.mydomain.tld:263
[19569] 17 Sep 19:01:20.096 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:20.096 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:20.143 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:20.143 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:29.943 # +redirect-to-master serverA serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:29.943 # +redirect-to-master serverB serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:30.037 - Reading from client: Connection reset by peer
[19569] 17 Sep 19:01:30.140 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:30.140 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:30.156 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:30.156 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:39.962 # +redirect-to-master serverA serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:39.962 # +redirect-to-master serverB serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:40.169 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:40.169 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:40.191 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:40.191 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:49.980 # +redirect-to-master serverA serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:49.980 # +redirect-to-master serverB serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:50.180 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:50.180 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:50.256 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:50.256 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:01:59.999 # +redirect-to-master serverA serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:01:59.999 # +redirect-to-master serverB serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:02:00.193 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:02:00.193 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:02:00.313 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:02:00.313 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:02:10.013 # +redirect-to-master serverA serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:02:10.013 # +redirect-to-master serverB serverA.mydomain.tld 6379 serverB.mydomain.tld 6379
[19569] 17 Sep 19:02:10.209 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:02:10.209 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379
[19569] 17 Sep 19:02:10.450 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverB.mydomain.tld 6379
[19569] 17 Sep 19:02:10.450 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverB.mydomain.tld 6379

的概述CONFIG GET slaveof

  • 服务器 A:("serverB.mydomain.tld"现在这是错误的!)
  • 服务器 B:("serverA.mydomain.tld"现在这是错误的!)

此外,本节每 5 秒重复一次(无休止地):

[19569] 17 Sep 19:05:40.351 # +redirect-to-master serverA serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:05:40.351 # +redirect-to-master serverB serverB.mydomain.tld 6379 serverA.mydomain.tld 6379
[19569] 17 Sep 19:05:40.723 - Accepted serverB.mydomain.tld:38423
[19569] 17 Sep 19:05:40.724 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:05:40.724 - Accepted serverB.mydomain.tld:38424
[19569] 17 Sep 19:05:40.724 * +sentinel sentinel serverB.mydomain.tld:26379 serverB.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
[19569] 17 Sep 19:05:41.514 - Client closed connection
[19569] 17 Sep 19:05:41.515 - Client closed connection
[19569] 17 Sep 19:05:42.112 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverA serverA.mydomain.tld 6379
[19569] 17 Sep 19:05:42.112 * +sentinel sentinel serverC.mydomain.tld:26379 serverC.mydomain.tld 26379 @ serverB serverA.mydomain.tld 6379
4

2 回答 2

2

您必须在哨兵配置中仅指定主服务器(仅在此处指定主服务器),因为哨兵本身应自动发现从服务器。

它卡住了,因为这里的两者都被视为您的哨兵配置中的主人。

我建议阅读redis-sentinel文档以获取更多信息以及它是如何工作的

于 2014-06-04T10:11:11.100 回答
0

我将修改您的测试程序如下(添加了粗线):

  1. 启动两个 redis 节点
  2. 启动每个哨兵节点(全部三个)
  3. 停止服务器 A(主服务器)
  4. 服务器 B 成为主服务器
  5. 再次启动服务器 A(主服务器)
  6. 在 A 上,运行 slaveof(B)
  7. 等待哨兵集群检测到A
  8. 停止服务器 B(从)
  9. 再次启动 Server B (Slave)
  10. 在 B 上,运行 slaveof(A)

如果您专门尝试测试退化案例,请绝对忽略我的建议。但是如果你试图模拟一个真实的恢复场景,粗体线应该告诉哨兵集群发生了什么。

于 2014-02-03T21:37:54.263 回答