当将 AWS 中的 ElastiCache 从 3 个分片扩展到 12 个时,我的服务开始抛出 500 个错误,并且与客户端的连接丢失。检查日志时,我看到以下错误:
https://paste-bin.xyz/14386(整个堆栈跟踪太大而无法发布)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:44)
... 46 more
Caused by: io.netty.channel.ChannelException: Failed to open a socket.
at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:71)
at io.netty.channel.socket.nio.NioSocketChannel.<init>(NioSocketChannel.java:88)
at io.netty.channel.socket.nio.NioSocketChannel.<init>(NioSocketChannel.java:81)
... 50 more
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:439)
at sun.nio.ch.Net.socket(Net.java:432)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:103)
at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:69)
... 52 more
从上面的修剪日志看来,客户端无法跟上的连接请求太多。但我不确定我在这里是否正确。
我将分片数从 12 减少到 7,并且在日志中没有看到上述错误。但是,当有 3 个分片时,会有更多的缓存未命中。分片配置与 1 个主节点和 3 个工作节点相同。我的机器最多可以处理 65535 个文件描述符,我认为这对于 12 个分片来说已经足够了。任何指向正在发生的事情的指针都非常感谢!