0

我有一个有 10 台机器的系统,我需要在每台机器上按同步顺序逐一执行特定任务。基本上只有一台机器应该在特定时间执行该任务。我们已经Consul用于其他目的,但我在想我们也可以这样Consul做吗?

我阅读了更多关于它的内容,看起来我们可以使用领事选举领导者,每台机器都会尝试获取锁,完成工作,然后释放锁。一旦工作完成,它将释放锁,然后其他机器将再次尝试获取锁并执行相同的工作。这样一来,一切都将一次同步一台机器。

我决定使用这个已经内置了这个功能的C# PlayFab ConsulDotNet ,但是如果有更好的选择,我也愿意接受。我的代码库中的以下Action方法几乎通过观察者机制在每台机器上同时调用。

 private void Action() {
    // Try to acquire lock using Consul.
    // If lock acquired then DoTheWork() otherwise keep waiting for it until lock is acquired.
    // Once work is done, release the lock
    // so that some other machine can acquire the lock and do the same work.
 }

现在在上述方法中,我需要做以下事情 -

  • 尝试获取锁。如果您无法获得锁,请等待它,因为其他机器可能在您之前抓住了它。
  • 如果获得锁,则 DoTheWork()。
  • 工作完成后,释放锁,以便其他机器可以获取锁并执行相同的工作。

想法是所有 10 台机器都应按DoTheWork()同步顺序一次一台。基于这个博客和这个博客,我决定修改他们的例子来适应我们的需要——

下面是我的LeaderElectionService课:

public class LeaderElectionService
{
    public LeaderElectionService(string leadershipLockKey)
    {
        this.key = leadershipLockKey;
    }

    public event EventHandler<LeaderChangedEventArgs> LeaderChanged;
    string key;
    CancellationTokenSource cts = new CancellationTokenSource();
    Timer timer;
    bool lastIsHeld = false;
    IDistributedLock distributedLock;

    public void Start()
    {
        timer = new Timer(async (object state) => await TryAcquireLock((CancellationToken)state), cts.Token, 0, Timeout.Infinite);
    }

    private async Task TryAcquireLock(CancellationToken token)
    {
        if (token.IsCancellationRequested)
            return;
        try
        {
            if (distributedLock == null)
            {
                var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.host.domain.com") };
                ConsulClient client = new ConsulClient(clientConfig);
                distributedLock = await client.AcquireLock(new LockOptions(key) { LockTryOnce = true, LockWaitTime = TimeSpan.FromSeconds(3) }, token).ConfigureAwait(false);
            }
            else
            {
                if (!distributedLock.IsHeld)
                {
                    await distributedLock.Acquire(token).ConfigureAwait(false);
                }
            }
        }
        catch (LockMaxAttemptsReachedException ex)
        {
            //this is expected if it couldn't acquire the lock within the first attempt.
            Console.WriteLine(ex.Stacktrace);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Stacktrace);
        }
        finally
        {
            bool lockHeld = distributedLock?.IsHeld == true;
            HandleLockStatusChange(lockHeld);
            //Retrigger the timer after a 10 seconds delay (in this example). Delay for 7s if not held as the AcquireLock call will block for ~3s in every failed attempt.
            timer.Change(lockHeld ? 10000 : 7000, Timeout.Infinite);
        }
    }

    protected virtual void HandleLockStatusChange(bool isHeldNew)
    {
        // Is this the right way to check and do the work here?
        // In general I want to call method "DoTheWork" in "Action" method itself
        // And then release and destroy the session once work is done.
        if (isHeldNew)
        {
            // DoTheWork();
            Console.WriteLine("Hello");
            // And then were should I release the lock so that other machine can try to grab it?
            // distributedLock.Release();
            // distributedLock.Destroy();
        }

        if (lastIsHeld == isHeldNew)
            return;
        else
        {
            lastIsHeld = isHeldNew;
        }

        if (LeaderChanged != null)
        {
            LeaderChangedEventArgs args = new LeaderChangedEventArgs(lastIsHeld);
            foreach (EventHandler<LeaderChangedEventArgs> handler in LeaderChanged.GetInvocationList())
            {
                try
                {
                    handler(this, args);
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Stacktrace);
                }
            }
        }
    }
}

下面是我的LeaderChangedEventArgs课:

public class LeaderChangedEventArgs : EventArgs
{
    private bool isLeader;

    public LeaderChangedEventArgs(bool isHeld)
    {
        isLeader = isHeld;
    }

    public bool IsLeader { get { return isLeader; } }
}

在上面的代码中,我的用例可能不需要很多部分,但想法是相同的。

问题陈述

现在在我的Action方法中,我想使用上面的类并在获得锁后立即执行任务,否则继续等待锁。一旦工作完成,释放并销毁会话,以便其他机器可以抓住它并完成工作。我对如何在下面的方法中正确使用上面的类有点困惑。

 private void Action() {
    LeaderElectionService electionService = new LeaderElectionService("data/process");
    // electionService.LeaderChanged += (source, arguments) => Console.WriteLine(arguments.IsLeader ? "Leader" : "Slave");
    electionService.Start();

    // now how do I wait for the lock to be acquired here indefinitely
    // And once lock is acquired, do the work and then release and destroy the session
    // so that other machine can grab the lock and do the work
 }

我最近开始使用,C#所以这就是为什么有点困惑如何通过使用Consul和这个库在生产中有效地工作。

更新

我根据您的建议尝试了下面的代码,我想我之前也尝试过,但是由于某种原因,一旦它进入这一行 await distributedLock.Acquire(cancellationToken);,它就会自动返回到 main 方法。它永远不会前进到我的Doing Some Work!打印输出。CreateLock真的有效吗?我期待它会data/lock在 consul 上创建(因为它不存在),然后尝试获取它的锁,如果获得,然后做工作,然后为其他机器释放它?

private static CancellationTokenSource cts = new CancellationTokenSource();

public static void Main(string[] args)
{
    Action(cts.Token);
    Console.WriteLine("Hello World");
}

private static async Task Action(CancellationToken cancellationToken)
{
    const string keyName = "data/lock";

    var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.test.host.com") };
    ConsulClient client = new ConsulClient(clientConfig);
    var distributedLock = client.CreateLock(keyName);

    while (true)
    {
        try
        {
            // Try to acquire lock
            // As soon as it comes to this line,
            // it just goes back to main method automatically. not sure why
            await distributedLock.Acquire(cancellationToken);

            // Lock is acquired
            // DoTheWork();
            Console.WriteLine("Doing Some Work!");

            // Work is done. Jump out of loop to release the lock
            break;
        }
        catch (LockHeldException)
        {
            // Cannot acquire the lock. Wait a while then retry
            await Task.Delay(TimeSpan.FromSeconds(10), cancellationToken);
        }
        catch (Exception)
        {
            // TODO: Handle exception thrown by DoTheWork method

            // Here we jump out of the loop to release the lock
            // But you can try to acquire the lock again based on your requirements
            break;
        }
    }

    // Release and destroy the lock
    // So that other machine can grab the lock and do the work
    await distributedLock.Release(cancellationToken);
    await distributedLock.Destroy(cancellationToken);
}
4

1 回答 1

1

国际海事组织,LeaderElectionService在你的情况下,来自那些博客是一种矫枉过正。

更新 1

不需要while循环,因为:

  1. ConsulClient是局部变量
    • 无需检查IsHeld财产
  2. Acquire将无限期阻塞,除非
    • 设置LockTryOnce为真LockOptions
    • 将超时设置为CancellationToken

旁注,调用分布式锁(参考Destroy)后不需要调用方法。Release

private async Task Action(CancellationToken cancellationToken)
{
    const string keyName = "YOUR_KEY";

    var client = new ConsulClient();
    var distributedLock = client.CreateLock(keyName);

    try
    {
        // Try to acquire lock
        // NOTE:
        //   Acquire method will block indefinitely unless
        //     1. Set LockTryOnce = true in LockOptions
        //     2. Pass a timeout to cancellation token
        await distributedLock.Acquire(cancellationToken);

        // Lock is acquired
        DoTheWork();
    }
    catch (Exception)
    {
        // TODO: Handle exception thrown by DoTheWork method
    }

    // Release the lock (not necessary to invoke Destroy method), 
    // so that other machine can grab the lock and do the work
    await distributedLock.Release(cancellationToken);
}

更新 2

OP 的代码只是返回到Main方法的原因是,Action方法没有等待。如果您使用 C# 7.1,则可以使用async Mainawait ,并使用Action方法。

public static async Task Main(string[] args)
{
    await Action(cts.Token);
    Console.WriteLine("Hello World");
}
于 2020-10-20T05:48:26.440 回答