18

我们有以下设置:

在 RackspaceCloud 8GB 实例上的 Ubuntu Linux 12.04LTE 上的 Redis 2.6,具有以下设置:

daemonize yes
pidfile /var/run/redis_6379.pid

port 6379

timeout 300

loglevel notice
logfile /var/log/redis_6379.log

databases 16

save 900 1
save 300 10
save 60 10000

rdbcompression yes
dbfilename dump.rdb
dir /var/redis/6379

requirepass PASSWORD

maxclients 10000

maxmemory 7gb
maxmemory-policy allkeys-lru
maxmemory-samples 3

appendonly no

slowlog-log-slower-than 10000
slowlog-max-len 128

activerehashing yes

我们的应用服务器托管在 RackSpace Managed 中,并通过公共 IP 连接到 Redis(以避免必须设置 RackSpace Connect,这是一个皇家 PITA),并且我们通过要求 Redis 连接的密码来提供一些安全性。我手动将 unix 文件描述符限制增加到 10240,最大 10k 连接应该提供足够的空间。正如您从上面的设置文件中看到的那样,我将内存使用量限制为 7GB 以留出一些 RAM 空间。

我们使用 ServiceStack C# Redis 驱动程序。我们使用以下 web.config 设置:

<RedisConfig suffix="">
  <Primary password="PASSWORD" host="HOST" port="6379"  maxReadPoolSize="50" maxWritePoolSize="50"/>
</RedisConfig>  

我们有一个 PooledRedisClientManager 单例,每个 AppPool 创建一次,如下所示:

private static PooledRedisClientManager _clientManager;
public static PooledRedisClientManager ClientManager
{
    get
    {
        if (_clientManager == null)
        {
            try
            {
                var poolConfig = new RedisClientManagerConfig
                {
                    MaxReadPoolSize = RedisConfig.Config.Primary.MaxReadPoolSize,
                    MaxWritePoolSize = RedisConfig.Config.Primary.MaxWritePoolSize,
                };

                _clientManager = new PooledRedisClientManager(new List<string>() { RedisConfig.Config.Primary.ToHost() }, null, poolConfig);
            }
            catch (Exception e)
            {
                log.Fatal("Could not spin up Redis", e);
                CacheFailed = DateTime.Now;
            }
        }
        return _clientManager;
    }
}

我们获取一个连接并执行 put/get 操作,如下所示:

    using (var client = ClientManager.GetClient())
    {
        client.Set<T>(region + key, value);
    }

代码似乎主要工作。鉴于我们有大约 20 个 AppPools 和 50-100 个读取和 50-100 个写入客户端,我们预计最多有 2000-4000 个到 Redis 服务器的连接。但是,我们在错误日志中不断看到以下异常,通常有几百个异常聚集在一起,一个小时内什么也没有,然后又一次,令人作呕。

System.IO.IOException: Unable to read data from the transport connection:
An existing connection was forcibly closed by the remote host.
---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at
System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at
System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace
- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.ReadByte() at
ServiceStack.Redis.RedisNativeClient.ReadLine() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 85 at
ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at
ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96

我们已经尝试过 Redis 服务器超时 0(即没有连接超时),超时 24 小时,以及介于两者之间,没有运气。谷歌搜索和 Stackoverflowing 并没有带来真正的答案,一切似乎都表明我们至少用代码做正确的事情。

我们的感觉是,在 Rackspace Hosted 和 Rackspace Cloud 之间,我们经常遇到持续的网络延迟问题,这会导致 TCP 连接块过时。我们可以通过实现客户端连接超时来解决这个问题,问题是我们是否也需要服务器端超时。但这只是一种感觉,我们不能 100% 确定我们走在正确的轨道上。

想法?

编辑:我偶尔也会看到以下错误:

ServiceStack.Redis.RedisException: Unable to Connect: sPort: 65025 ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(IList`1 buffers, SocketFlags socketFlags) at ServiceStack.Redis.RedisNativeClient.FlushSendBuffer() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 273 at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 203 --- End of inner exception stack trace --- at ServiceStack.Redis.RedisNativeClient.CreateConnectionError() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 165 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96

我想这是服务器端连接超时未在客户端处理的直接结果。看起来我们真的需要处理客户端连接超时。

4

2 回答 2

9

我们认为我们在仔细阅读 Redis 文档并发现这种美(http://redis.io/topics/persistence)后找到了根本原因:

RDB needs to fork() often in order to persist on disk using a child process.
Fork() can be time consuming if the dataset is big, and may result in Redis
to stop serving clients for some millisecond or even for one second if the
dataset is very big and the CPU performance not great. AOF also needs to fork()
but you can tune how often you want to rewrite your logs without any trade-off
on durability.

我们关闭了 RDB 持久性,从那以后就没有看到这些连接断开。

于 2013-11-20T15:47:19.300 回答
1

似乎将服务器超时设置为 300 从 0 缓解了连接失败的问题。仍然看到一些错误的连接,但这可能是因为 PooledRedisClientManager 没有正确检查从 GetClient() 调用的 GetInActiveWriteClient () 的连接状态。

于 2012-11-01T15:12:56.703 回答