5

在我的绳索的最后一点:) 我有一个原型(太大,有太多的依赖关系无法共享),它出于多种原因使用 redis - 其中一个是存储序列化值,并控制更新到通过使用带有LockTake/Release单独钥匙的防护锁来获得该值。

整个应用看起来有点像这样(注意:这个片段不能重现我的问题!):

using Nito.AsyncEx;
using StackExchange.Redis;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace RedisAzureLockingTest
{
    class Program
    {
        static void Main(string[] args)
        {
            AsyncContext.Run(async () =>
            {

                var cm = await ConnectionMultiplexer.ConnectAsync("blah:6379,ssl=false,password=blah,defaultDatabase=1,syncTimeout=5000");
                var db = cm.GetDatabase();

                // store key
                RedisKey key = "thisisourtest";
                await db.StringSetAsync(key, "initial value", flags: CommandFlags.DemandMaster); // SET

                // acquire lock
                RedisKey lockkey = "thisisourtest.lock";
                string locktoken = Guid.NewGuid().ToString();
                bool success = await db.LockTakeAsync(lockkey, locktoken, TimeSpan.FromDays(1), CommandFlags.DemandMaster);
                if (!success) throw new InvalidOperationException("Sure ok - lock couldnt be taken");

                try
                {
                    // do some stuff whilst the lock is taken

                    var oldval = await db.StringGetAsync(key, CommandFlags.DemandMaster);
                    if (oldval.IsNullOrEmpty) throw new InvalidOperationException("Key doesnt exist");

                    // persist an update
                    var newval = Guid.NewGuid().ToString();
                    await db.StringSetAsync(key, newval, flags: CommandFlags.DemandMaster); // SET
                }
                finally
                {
                    // release lock
                    if (!await db.LockReleaseAsync(lockkey, locktoken, CommandFlags.DemandMaster))
                        throw new InvalidOperationException("Should never occur - we couldnt release our own lock  is now locked forever!");

                    // double check that the lock has been released
                    var locktok2 = await db.LockQueryAsync(lockkey, CommandFlags.DemandMaster);
                    if (locktok2.HasValue) throw new InvalidOperationException("Should never occur - we couldnt release our own lock is now locked forever! Even worse- lock release lied about releasing itself");
                }

                Console.WriteLine("WORKED");
            });

            Console.ReadLine();
        }
    }
}

我一直在本地使用一个简单的 redis 单个实例进行测试,从来没有遇到任何问题,现在我在另一个环境中尝试并一直在使用 Azure C0 Basic 实例。半可靠(我现在已经设法将我的代码库的本地副本设置为指向 Azure 实例)我可以重现问题 - 但不知道可能出了什么问题或如何进一步调试问题。

我观察到的行为是:

  1. LockTakeAsync工作正常
  2. 我的“做事”位执行正常
  3. LockReleaseAsync似乎成功(返回 TRUE),但未从 redis 中删除锁定键(使用 cmdline redis-cli 工具确认)。

我试过了:

  • 使用追踪ConnectionMultiplexer- 没有任何不愉快的事情出现
  • 将我的应用程序缩减为一个简单的测试用例(见上文) - 但这不会重现问题,所以它必须是外部的
  • LockTake/LockRelease切换到调用的非异步版本- 问题仍然存在
  • 添加了围绕LockReleaseAsync调用的日志记录(包括SET对 redis 的“DEBUG”调用以跟踪 - 见下文)以确认事件的确切顺序
  • 在上面的代码片段中添加了LockQueryAsync调用以确认锁仍然被持有!
  • 指定我的所有命令都应在主副本上执行。

唯一能MONITOR在 redis 实例上运行并捕获 SE.Redis 正在做什么的痕迹。当它在本地执行并且应用程序正常工作时,我会得到类似(键名已更改):

1462360519.322029 [1 37.157.34.228:1995] "SET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock" "647672fd-ae06-4b6e-be67-341ac583a366" "EX" "86400" "NX"
1462360519.332884 [1 37.157.34.228:1995] "GET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb"
1462360519.342668 [1 37.157.34.228:1995] "SET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb" "ChIJHcrmkwgwX0cRkssud4TmJcsQARoSCQAAAAAAAAAAEQADAAAAAAAHIgNHQlAyCQi+koDo9oL9GToJCL6SgOj2gv0ZQgkIvvqI77mD/Rk="
1462360519.354847 [1 37.157.34.228:1995] "GET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock"
1462360519.364666 [1 37.157.34.228:1995] "SET" "DEBUG" "1"
1462360519.387834 [1 37.157.34.228:1995] "WATCH" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock"
1462360519.387866 [1 37.157.34.228:1995] "GET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock"
1462360519.401686 [1 37.157.34.228:1995] "MULTI"
1462360519.401708 [1 37.157.34.228:1995] "DEL" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock"
1462360519.401726 [1 37.157.34.228:1995] "EXEC"
1462360519.414845 [1 37.157.34.228:1995] "SELECT" "1"
1462360519.414862 [1 37.157.34.228:1995] "SET" "DEBUG" "2"
1462360519.424950 [1 37.157.34.228:1995] "GET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb.lock"
1462360519.452993 [1 37.157.34.228:1995] "GET" "thisisourtest.93e6ca1d-3008-475f-92cb-2e7784e625cb"

当我针对 Azure 运行并重新创建问题时,我得到:

1462359810.253275 [1 23.97.166.137:1277] "SET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69.lock" "35be6e88-7240-4772-ac2d-220a57ed1a79" "EX" "86400" "NX"
1462359810.256639 [1 23.97.166.137:1277] "GET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69"
1462359810.258605 [1 23.97.166.137:1277] "SET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69" "ChIJsxlhwj7M8UgRiDRsex06+2kQARoSCQAAAAAAAAAAEQADAAAAAAADIgNHQlAyCQi1nJ6N3IL9GToJCLWcno3cgv0ZQgkItYSnlJ+D/Rk="
1462359810.260233 [1 23.97.166.137:1277] "GET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69.lock"
1462359810.262790 [1 23.97.166.137:1277] "SET" "DEBUG" "1"
1462359810.283693 [1 23.97.166.137:1277] "WATCH" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69.lock"
1462359810.283724 [1 23.97.166.137:1277] "GET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69.lock"
1462359812.329321 [0 23.97.166.137:1257] "UNSUBSCRIBE" "U\xc7\n\xae\xa7\x1c\x84K\x8f\x1ft\x00\\j\xc2j"
1462359812.329374 [3 23.97.166.137:1256] "INFO" "replication"
1462359812.357770 [1 23.97.166.137:1259] "INFO" "replication"
1462359814.186895 [0 23.97.166.137:1312] "INFO" "replication"
1462359815.285593 [1 23.97.166.137:1277] "UNWATCH"
1462359815.285621 [1 23.97.166.137:1277] "SET" "DEBUG" "2"
1462359815.292618 [1 23.97.166.137:1277] "GET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69.lock"
1462359815.302945 [1 23.97.166.137:1277] "GET" "thisisourtest.c26119b3-cc3e-48f1-8834-6c7b1d3afb69"

对我来说,看起来好像WATCH失败了(是否涉及 Azure 中的复制?)并且发布失败。SE.Redis似乎DEL没有发出命令(没关系)。MULTI/EXEC但是LockReleaseAsync没有报告这一点 - 我也无法在 MONITOR 日志中看到影响相关密钥的呼叫。

难住了。

关于如何进一步隔离它的任何想法?尝试构建一个小测试用例不会很快。

干杯!

4

0 回答 0