1

Redis 设置和 twemproxy:
我们有六个 Redis(32 位)版本 3.2.3 实例与 Twemproxy 一起在同一台服务器(AWS EC2 实例 r3.xlarge)上运行。它运行顺利,直到几天。个别实例已开始出现 OOM 问题。无法确定内存不足的确切问题。

Redis 配置:

  1. BG 保存每半小时通过一个脚本为每个实例按顺序触发一次。它在 conf 文件中被关闭。
  2. 将所有六个 redis 实例的 maxmemory 保持在 3.5 GB。
  3. 易变 TTL 的驱逐策略
  4. 到目前为止,每个 redis 实例的峰值使用量在 2.2GB 左右,因此仍未达到 3.5 maxmemory 的限制。

我搜索并修复的内容:

  1. 最大连接数增加到 65535
  2. vm.overcommit_memory 为 1(这可能是 OOM 的原因吗?)
  3. 禁用 THP
  4. 增加 ulimit

据了解,如果 Redis 需要内存,它应该从缓冲内存中使用它。免费命令的输出:

             total       used       free     shared    buffers    cached
Mem:         30679      27430       3249          0        288      11481
-/+ buffers/cache:      15660      15019
Swap:            0          0          0

top 命令的输出: top 命令输出

Redis 的错误日志:

1491:M 05 Jul 21:01:55.090 * Background saving terminated with success
1491:M 05 Jul 21:19:32.121 # Out Of Memory allocating 32774 bytes!


=== REDIS BUG REPORT START: Cut & paste starting from here ===
1491:M 05 Jul 21:19:32.121 # ------------------------------------------------
1491:M 05 Jul 21:19:32.121 # !!! Software Failure. Press left mouse button to continue
1491:M 05 Jul 21:19:32.122 # Guru Meditation: "Redis aborting for OUT OF MEMORY" #server.c:3817
1491:M 05 Jul 21:19:32.122 # (forcing SIGSEGV in order to print the stack trace)
1491:M 05 Jul 21:19:32.122 # ------------------------------------------------
1491:M 05 Jul 21:19:32.123 # Redis 3.2.3 crashed by signal: 11
1491:M 05 Jul 21:19:32.123 # Crashed running the instuction at: 0x80a5c4d
1491:M 05 Jul 21:19:32.123 # Accessing address: 0xffffffff
1491:M 05 Jul 21:19:32.123 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/bin/redis-server *:6381(_serverPanic+0x8d)[0x80a5c4d]

Backtrace:
/usr/local/bin/redis-server *:6381(logStackTrace+0x35)[0x80a7675]
/usr/local/bin/redis-server *:6381(sigsegvHandler+0xc2)[0x80a7bf2]
[0xf77dc420]
/usr/local/bin/redis-server *:6381(_serverPanic+0x8d)[0x80a5c4d]
/usr/local/bin/redis-server *:6381(redisOutOfMemoryHandler+0x3b)[0x806807b]
/usr/local/bin/redis-server *:6381(zmalloc+0x61)[0x8072621]
/usr/local/bin/redis-server *:6381(sdsMakeRoomFor+0x185)[0x8070245]
/usr/local/bin/redis-server *:6381(readQueryFromClient+0x87)[0x807b447]
/usr/local/bin/redis-server *:6381(aeProcessEvents+0x291)[0x80644f1]
/usr/local/bin/redis-server *:6381(aeMain+0x2c)[0x806477c]
/usr/local/bin/redis-server *:6381(main+0x454)[0x80614a4]
/lib32/libc.so.6(__libc_start_main+0xf3)[0xf75ddad3]
/usr/local/bin/redis-server *:6381[0x806175c]

------ INFO OUTPUT ------
# Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:bcabd22adb39b278
redis_mode:standalone
os:Linux 3.13.0-74-generic x86_64
arch_bits:32
multiplexing_api:epoll
gcc_version:4.8.4
process_id:1491
run_id:653209594a744f112fb3c7915d3c349a2c65b912
tcp_port:6381
uptime_in_seconds:26753226
uptime_in_days:309
hz:10
lru_clock:6117348
executable:/usr/local/bin/redis-server
config_file:/etc/redis/redis_6381.conf

# Clients
connected_clients:20
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:2326410216
used_memory_human:2.17G
used_memory_rss:2691825664
used_memory_rss_human:2.51G
used_memory_peak:2340860240
used_memory_peak_human:2.18G
total_system_memory:2105458688
total_system_memory_human:1.96G
used_memory_lua:25600
used_memory_lua_human:25.00K
maxmemory:3758096384
maxmemory_human:3.50G
maxmemory_policy:volatile-ttl
mem_fragmentation_ratio:1.16
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:3132
rdb_bgsave_in_progress:0
rdb_last_save_time:1499288515
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:34
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:171113
total_commands_processed:82044062
instantaneous_ops_per_sec:47
total_net_input_bytes:127272847135
total_net_output_bytes:565881208424
instantaneous_input_kbps:20.80
instantaneous_output_kbps:1176.51
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:7334558
evicted_keys:0
keyspace_hits:35375621
keyspace_misses:8547031
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:44027
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:11970.49
used_cpu_user:8139.96
used_cpu_sys_children:12063.56
used_cpu_user_children:106498.20

# Commandstats
cmdstat_get:calls=41565702,usec=241468654,usec_per_call=5.81
cmdstat_set:calls=8389482,usec=67020049,usec_per_call=7.99
cmdstat_del:calls=591225,usec=2263300,usec_per_call=3.83
cmdstat_exists:calls=287486,usec=828495,usec_per_call=2.88
cmdstat_sadd:calls=10618626,usec=25885617,usec_per_call=2.44
cmdstat_sismember:calls=771569,usec=897763,usec_per_call=1.16
cmdstat_smembers:calls=174,usec=5667,usec_per_call=32.57
cmdstat_expire:calls=16306131,usec=37164332,usec_per_call=2.28
cmdstat_scan:calls=243377,usec=3670788,usec_per_call=15.08
cmdstat_auth:calls=14946,usec=32970,usec_per_call=2.21
cmdstat_bgsave:calls=14805,usec=295936307,usec_per_call=19988.94
cmdstat_lastsave:calls=75581,usec=188956,usec_per_call=2.50
cmdstat_flushall:calls=6,usec=406220,usec_per_call=67703.33
cmdstat_info:calls=118318,usec=5922315,usec_per_call=50.05
cmdstat_monitor:calls=17,usec=19,usec_per_call=1.12
cmdstat_ttl:calls=3046578,usec=7189430,usec_per_call=2.36
cmdstat_command:calls=39,usec=19330,usec_per_call=495.64

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=394418,expires=353679,avg_ttl=5553166456
hash_init_value: 1473181388

------ CLIENT LIST OUTPUT ------
id=12 addr=127.0.0.1:44210 fd=16 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=13 addr=127.0.0.1:44211 fd=17 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=14 addr=127.0.0.1:44212 fd=18 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=15 addr=127.0.0.1:44213 fd=19 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=16 addr=127.0.0.1:44214 fd=20 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=17 addr=127.0.0.1:44215 fd=21 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=18 addr=127.0.0.1:44219 fd=22 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=set
id=19 addr=127.0.0.1:44220 fd=23 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=20 addr=127.0.0.1:44221 fd=24 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get
id=21 addr=127.0.0.1:44222 fd=25 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=2 addr=127.0.0.1:44142 fd=6 name= age=26752852 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=3 addr=127.0.0.1:44199 fd=7 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=4 addr=127.0.0.1:44201 fd=8 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=5 addr=127.0.0.1:44202 fd=9 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=6 addr=127.0.0.1:44203 fd=10 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=7 addr=127.0.0.1:44204 fd=11 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=8 addr=127.0.0.1:44205 fd=12 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=9 addr=127.0.0.1:44207 fd=13 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=10 addr=127.0.0.1:44208 fd=14 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=11 addr=127.0.0.1:44209 fd=15 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get

------ REGISTERS ------
1491:M 05 Jul 21:19:32.131 #
EAX:00000000 EBX:00000000 ECX:f776b420 EDX:00000000
EDI:00000002 ESI:00008006 EBP:00000005 ESP:ff881300
SS :0000002b EFL:ff881300 EIP:080a5c4d CS:00000023
DS :0000002b ES :0000002b FS :00000000 GS:00000063
1491:M 05 Jul 21:19:32.131 # (ff88130f) -> f7010c09
1491:M 05 Jul 21:19:32.131 # (ff88130e) -> 00000ee9
1491:M 05 Jul 21:19:32.131 # (ff88130d) -> 0812a6a0
1491:M 05 Jul 21:19:32.131 # (ff88130c) -> 081280e0
1491:M 05 Jul 21:19:32.131 # (ff88130b) -> 0806807b
1491:M 05 Jul 21:19:32.131 # (ff88130a) -> 00000000
1491:M 05 Jul 21:19:32.131 # (ff881309) -> 00000002
1491:M 05 Jul 21:19:32.131 # (ff881308) -> f7010c20
1491:M 05 Jul 21:19:32.131 # (ff881307) -> e3769a00
1491:M 05 Jul 21:19:32.131 # (ff881306) -> 00000000
1491:M 05 Jul 21:19:32.131 # (ff881305) -> 00008006
1491:M 05 Jul 21:19:32.131 # (ff881304) -> 00000ee9
1491:M 05 Jul 21:19:32.131 # (ff881303) -> 0812a6a0
1491:M 05 Jul 21:19:32.131 # (ff881302) -> 081280e0
1491:M 05 Jul 21:19:32.131 # (ff881301) -> 08131cac
1491:M 05 Jul 21:19:32.131 # (ff881300) -> 00000003

------ FAST MEMORY TEST ------
1491:M 05 Jul 21:19:32.133 # Bio thread for job type #0 terminated
1491:M 05 Jul 21:19:32.134 # Bio thread for job type #1 terminated
*** Preparing to test memory region 200000 (132120576 bytes)
*** Preparing to test memory region 816b000 (49152 bytes)
*** Preparing to test memory region 8200000 (14680064 bytes)
*** Preparing to test memory region 91a5000 (135168 bytes)
*** Preparing to test memory region 9200000 (3972005888 bytes)
*** Preparing to test memory region f5fff000 (8388608 bytes)
*** Preparing to test memory region f6800000 (8388608 bytes)
*** Preparing to test memory region f7000000 (4194304 bytes)
*** Preparing to test memory region f75c3000 (4096 bytes)
*** Preparing to test memory region f776c000 (12288 bytes)
*** Preparing to test memory region f7789000 (8192 bytes)
*** Preparing to test memory region f77d1000 (4096 bytes)
*** Preparing to test memory region f77da000 (4096 bytes)
*** Preparing to test memory region f77db000 (4096 bytes)
*** Preparing to test memory region f7800000 (134217728 bytes)
*** Preparing to test memory region ffa00000 (4194304 bytes)
4

1 回答 1

1

学习很少——

  • Redis 32 位实例最多可使用 4G 内存。如果使用 4G,它不会使用交换空间或 vm 过量使用内存。

  • 当实际使用量小于 4G 即 2-2.5 G 时,我们应该使用 Redis 32 位实例。剩余的空间被输入缓冲区等其他东西使用。

  • 推荐的最大内存为 3G,但在我们的案例中,它仍然提供接近 2.8-3 G 的内存 OOM。所以应该比这个少。

  • 要监控 redis 的使用情况,我们应该始终查看 RSS(驻留集大小)。

于 2017-08-26T14:26:20.723 回答