我正在尝试使用 ServiceStack-Redis 库提供并在此处描述的锁定机制来实现 DLM ,但我发现 API 似乎存在竞争条件,有时会向多个客户端授予相同的锁。
BasicRedisClientManager mgr = new BasicRedisClientManager(redisConnStr);
using(var client = mgr.GetClient())
{
client.Remove("touchcount");
client.Increment("touchcount", 0);
}
Random rng = new Random();
Action<object> simulatedDistributedClientCode = (clientId) => {
using(var redisClient = mgr.GetClient())
{
using(var mylock = redisClient.AcquireLock("mutex", TimeSpan.FromSeconds(2)))
{
long touches = redisClient.Get<long>("touchcount");
Debug.WriteLine("client{0}: I acquired the lock! (touched: {1}x)", clientId, touches);
if(touches > 0) {
Debug.WriteLine("client{0}: Oh, but I see you've already been here. I'll release it.", clientId);
return;
}
int arbitraryDurationOfExecutingCode = rng.Next(100, 2500);
Thread.Sleep(arbitraryDurationOfExecutingCode); // do some work of arbitrary duration
redisClient.Increment("touchcount", 1);
}
Debug.WriteLine("client{0}: Okay, I released my lock, your turn now.", clientId);
}
};
Action<Task> exceptionWriter = (t) => {if(t.IsFaulted) Debug.WriteLine(t.Exception.InnerExceptions.First());};
int arbitraryDelayBetweenClients = rng.Next(5, 500);
var clientWorker1 = new Task(simulatedDistributedClientCode, 1);
var clientWorker2 = new Task(simulatedDistributedClientCode, 2);
clientWorker1.Start();
Thread.Sleep(arbitraryDelayBetweenClients);
clientWorker2.Start();
Task.WaitAll(
clientWorker1.ContinueWith(exceptionWriter),
clientWorker2.ContinueWith(exceptionWriter)
);
using(var client = mgr.GetClient())
{
var finaltouch = client.Get<long>("touchcount");
Console.WriteLine("Touched a total of {0}x.", finaltouch);
}
mgr.Dispose();
当运行上面的代码来模拟两个客户端在短时间内尝试相同的操作时,有三个可能的输出。第一个是互斥体正常工作并且客户端以正确顺序进行的最佳情况。第二种情况是第二个客户端超时等待获取锁;也是可以接受的结果。然而,问题在于,当arbitraryDurationOfExecutingCode
接近或超过获取锁的超时时间时,很容易重现第二个客户端在第一个客户端释放锁之前被授予锁的情况,产生如下输出:
client1:我获得了锁!(touched: 0x)
client2: 我获得了锁!(touched: 0x)
client1: 好的,我释放了我的锁,现在轮到你了。
客户 2:好的,我释放了我的锁,现在轮到你了。
一共摸了2x。
我对 API 及其文档的理解是,timeOut
获取锁时的参数就是——获取锁的超时时间。如果我必须猜测一个timeOut
足够高的值以始终长于执行代码的持续时间以防止出现这种情况,那似乎很容易出错。除了传递 null 来永远等待锁之外,还有其他人可以解决吗?我绝对不想这样做,否则我知道我最终会得到坠毁工人的鬼锁。