We use a Redis (version 2.8) with Windows in our production servers. And have next "master-slave"configuration:
- 3 services in separate servers (1 master and 2 slaves)
- 3 sentinels in each server
Until we do not have a lot of loading and big size of data, and replication works as we need. BUT every day we have strange behaviour. In Redis-service log we see:
IPs of our servers:
---.--.--.1
---.--.--.2
---.--.--.3
[3576] 30 Nov 00:15:07.271 # Connection with slave ---.--.-.2:6379 lost.
[3576] 30 Nov 00:15:07.282 # Connection with slave ---.--.-.3:6379 lost.
[3576] 30 Nov 00:15:18.242 * SLAVE OF ---.--.-.3:6379 enabled (user request)
[3576] 30 Nov 00:15:18.245 # CONFIG REWRITE executed with success.
[3576] 30 Nov 00:15:18.801 * Connecting to MASTER ---.--.-.3:6379
[3576] 30 Nov 00:15:18.801 * MASTER <-> SLAVE sync started
[3576] 30 Nov 00:15:18.802 * Non blocking connect for SYNC fired the event.
[3576] 30 Nov 00:15:18.802 * Master replied to PING, replication can continue...
[3576] 30 Nov 00:15:18.803 * Partial resynchronization not possible (no cached master)
[3576] 30 Nov 00:15:18.805 * Full resync from master: 2010f343b5b051924fb4826f826979f9683a73a9:24474754143
[3576] 30 Nov 00:15:20.772 * MASTER <-> SLAVE sync: receiving 66333159 bytes from master
[3576] 30 Nov 00:15:21.648 * MASTER <-> SLAVE sync: Flushing old data
[3576] 30 Nov 00:15:22.085 * MASTER <-> SLAVE sync: Loading DB in memory
[3576] 30 Nov 00:15:23.005 * MASTER <-> SLAVE sync: Finished with success
In two other servers logging same. But instead
[3576] 30 Nov 00:15:07.271 # Connection with slave ---.--.-.2:6379 lost.
logged
[6176] 30 Nov 00:15:12.451 # Connection with master lost.**strong text**
In sentinel logs next:
[6988] 30 Nov 00:06:25.031 # -sdown slave ---.--.-.3:6379 ---.--.-.3 6379 @ master ---.--.-.1 6379
[6988] 30 Nov 00:06:25.031 # -sdown sentinel ---.--.-.3:10001 ---.--.-.3 10001 @ master ---.--.-.1 6379
This occurs ONLY 1 time per day in begin of day, but not in certain fixed time (can start in 00:12:00 / 00:15:00/ 00:17:00 etc.) Regarding redis-sentinel configuration we have just this
port 10001
logfile "C:\\REDIS\\logs\\redis_sentinel.log"
sentinel monitor master ---.--.-.1 6379 2
sentinel down-after-milliseconds master 4000
sentinel failover-timeout master 30000
In redis-servers configuration actually we use default configuration almost (Just changed "maxheap" and "maxmemory").
I had found that possibly cases of this can be not enough replication timeout (repl-timeout = 60 sec by default) and client buffer limit ("client-output-buffer-limit"). But it is not our cases because we have very quickly replication (about 1 sec) and until small size of data. Also we have checked physical connection to our servers(ports) in time of this issue -> Connection is always good...
So who have any ideas about this regular lost connection and start failover process without reason??