我有一个有 10 台机器的系统,我需要在每台机器上按同步顺序逐一执行特定任务。基本上只有一台机器应该在特定时间执行该任务。我们已经Consul
用于其他目的,但我在想我们也可以这样Consul
做吗?
我阅读了更多关于它的内容,看起来我们可以使用领事选举领导者,每台机器都会尝试获取锁,完成工作,然后释放锁。一旦工作完成,它将释放锁,然后其他机器将再次尝试获取锁并执行相同的工作。这样一来,一切都将一次同步一台机器。
我决定使用这个已经内置了这个功能的C#
PlayFab ConsulDotNet
库,但是如果有更好的选择,我也愿意接受。我的代码库中的以下Action
方法几乎通过观察者机制在每台机器上同时调用。
private void Action() {
// Try to acquire lock using Consul.
// If lock acquired then DoTheWork() otherwise keep waiting for it until lock is acquired.
// Once work is done, release the lock
// so that some other machine can acquire the lock and do the same work.
}
现在在上述方法中,我需要做以下事情 -
- 尝试获取锁。如果您无法获得锁,请等待它,因为其他机器可能在您之前抓住了它。
- 如果获得锁,则 DoTheWork()。
- 工作完成后,释放锁,以便其他机器可以获取锁并执行相同的工作。
想法是所有 10 台机器都应按DoTheWork()
同步顺序一次一台。基于这个博客和这个博客,我决定修改他们的例子来适应我们的需要——
下面是我的LeaderElectionService
课:
public class LeaderElectionService
{
public LeaderElectionService(string leadershipLockKey)
{
this.key = leadershipLockKey;
}
public event EventHandler<LeaderChangedEventArgs> LeaderChanged;
string key;
CancellationTokenSource cts = new CancellationTokenSource();
Timer timer;
bool lastIsHeld = false;
IDistributedLock distributedLock;
public void Start()
{
timer = new Timer(async (object state) => await TryAcquireLock((CancellationToken)state), cts.Token, 0, Timeout.Infinite);
}
private async Task TryAcquireLock(CancellationToken token)
{
if (token.IsCancellationRequested)
return;
try
{
if (distributedLock == null)
{
var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.host.domain.com") };
ConsulClient client = new ConsulClient(clientConfig);
distributedLock = await client.AcquireLock(new LockOptions(key) { LockTryOnce = true, LockWaitTime = TimeSpan.FromSeconds(3) }, token).ConfigureAwait(false);
}
else
{
if (!distributedLock.IsHeld)
{
await distributedLock.Acquire(token).ConfigureAwait(false);
}
}
}
catch (LockMaxAttemptsReachedException ex)
{
//this is expected if it couldn't acquire the lock within the first attempt.
Console.WriteLine(ex.Stacktrace);
}
catch (Exception ex)
{
Console.WriteLine(ex.Stacktrace);
}
finally
{
bool lockHeld = distributedLock?.IsHeld == true;
HandleLockStatusChange(lockHeld);
//Retrigger the timer after a 10 seconds delay (in this example). Delay for 7s if not held as the AcquireLock call will block for ~3s in every failed attempt.
timer.Change(lockHeld ? 10000 : 7000, Timeout.Infinite);
}
}
protected virtual void HandleLockStatusChange(bool isHeldNew)
{
// Is this the right way to check and do the work here?
// In general I want to call method "DoTheWork" in "Action" method itself
// And then release and destroy the session once work is done.
if (isHeldNew)
{
// DoTheWork();
Console.WriteLine("Hello");
// And then were should I release the lock so that other machine can try to grab it?
// distributedLock.Release();
// distributedLock.Destroy();
}
if (lastIsHeld == isHeldNew)
return;
else
{
lastIsHeld = isHeldNew;
}
if (LeaderChanged != null)
{
LeaderChangedEventArgs args = new LeaderChangedEventArgs(lastIsHeld);
foreach (EventHandler<LeaderChangedEventArgs> handler in LeaderChanged.GetInvocationList())
{
try
{
handler(this, args);
}
catch (Exception ex)
{
Console.WriteLine(ex.Stacktrace);
}
}
}
}
}
下面是我的LeaderChangedEventArgs
课:
public class LeaderChangedEventArgs : EventArgs
{
private bool isLeader;
public LeaderChangedEventArgs(bool isHeld)
{
isLeader = isHeld;
}
public bool IsLeader { get { return isLeader; } }
}
在上面的代码中,我的用例可能不需要很多部分,但想法是相同的。
问题陈述
现在在我的Action
方法中,我想使用上面的类并在获得锁后立即执行任务,否则继续等待锁。一旦工作完成,释放并销毁会话,以便其他机器可以抓住它并完成工作。我对如何在下面的方法中正确使用上面的类有点困惑。
private void Action() {
LeaderElectionService electionService = new LeaderElectionService("data/process");
// electionService.LeaderChanged += (source, arguments) => Console.WriteLine(arguments.IsLeader ? "Leader" : "Slave");
electionService.Start();
// now how do I wait for the lock to be acquired here indefinitely
// And once lock is acquired, do the work and then release and destroy the session
// so that other machine can grab the lock and do the work
}
我最近开始使用,C#
所以这就是为什么有点困惑如何通过使用Consul
和这个库在生产中有效地工作。
更新
我根据您的建议尝试了下面的代码,我想我之前也尝试过,但是由于某种原因,一旦它进入这一行 await distributedLock.Acquire(cancellationToken);
,它就会自动返回到 main 方法。它永远不会前进到我的Doing Some Work!
打印输出。CreateLock
真的有效吗?我期待它会data/lock
在 consul 上创建(因为它不存在),然后尝试获取它的锁,如果获得,然后做工作,然后为其他机器释放它?
private static CancellationTokenSource cts = new CancellationTokenSource();
public static void Main(string[] args)
{
Action(cts.Token);
Console.WriteLine("Hello World");
}
private static async Task Action(CancellationToken cancellationToken)
{
const string keyName = "data/lock";
var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.test.host.com") };
ConsulClient client = new ConsulClient(clientConfig);
var distributedLock = client.CreateLock(keyName);
while (true)
{
try
{
// Try to acquire lock
// As soon as it comes to this line,
// it just goes back to main method automatically. not sure why
await distributedLock.Acquire(cancellationToken);
// Lock is acquired
// DoTheWork();
Console.WriteLine("Doing Some Work!");
// Work is done. Jump out of loop to release the lock
break;
}
catch (LockHeldException)
{
// Cannot acquire the lock. Wait a while then retry
await Task.Delay(TimeSpan.FromSeconds(10), cancellationToken);
}
catch (Exception)
{
// TODO: Handle exception thrown by DoTheWork method
// Here we jump out of the loop to release the lock
// But you can try to acquire the lock again based on your requirements
break;
}
}
// Release and destroy the lock
// So that other machine can grab the lock and do the work
await distributedLock.Release(cancellationToken);
await distributedLock.Destroy(cancellationToken);
}