我正在尝试使用两个 EC2 可用区中的两个 redis 主服务器构建一个作业队列。所有 LPUSH 操作都在应用层对两个 AZ 中的两台主机完成。理想情况下,我会使用GitHub 的 resque,但 resque似乎没有任何关于多个 AZ 中的多个 master 的概念。
我需要确保只有一名工人在从事给定的工作。一些工人将在 AZ 1A 中与 1A 中的 redis 机器对话,而一些工人将在 AZ 1B 中与 1B 中的机器对话。我需要避免 1A 中的工作人员和 1B 中的工作人员都从不同的 redis 主机中提取相同的作业并尝试同时处理它的情况。
这个工人伪代码是否有我可能错过的任何竞争条件?
job_id = master1.BRPOPLPUSH "queue", "working"
m1lock = master1.SETNX "lock.#{job_id}"
m2lock = master2.SETNX "lock.#{job_id}"
completed = master1.ZSCORE "completed", job_id
if completed
# must have been completed just now on other server, no-op
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
elsif not m1lock or not m2lock
# other server is working on it? We will put back at the end of our queue
master1.LPUSH "queue", job_id
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}" if m1lock
master2.del "lock.#{job_id}" if m2lock
else
# have a lock, it's not complete, so do work
do_work(job_id)
now = Time.now.to_i
master1.ZADD "completed", now, job_id
master2.ZADD "completed", now, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
master1.LREM "working", 0, job_id
master2.LREM "queue", 0, job_id # not strictly necessary b/c of "completed"
end