multithreading - What is the fastest race free method for polling a lockless queue?

Question

Say we have a single-producer-thread single-consumer-thread lockless queue, and that the producer may go long periods without producing any data. It would be beneficial to let the consumer thread sleep when there is nothing in the queue (for the power savings and freeing up the CPU for other processes/threads). If the queue were not lockless, the straightforward way to solve this problem is to have the producing thread lock a mutex, do its work, signal a condition variable and unlock, and for the reading thread to lock the mutex, wait on the condition variable, do its reading, then unlock. But if we're using a lockless queue, using a mutex the exact same way would eliminate the performance we gain from using a lockless queue in the first place.

The naive solution is to have the producer after each insertion into the queue lock the mutex, signal the condition variable, then unlock the mutex, keeping the actual work (the insertion into the queue) completely outside the lock, and to have the consumer do the same, locking the mutex, waiting on the condition variable, unlocking it, pulling everything off the queue, then repeat, keeping the reading of the queue outside the lock. There's a race condition here though: between the reader pulling off the queue and going to sleep, the producer may have inserted an item into the queue. Now the reader will go to sleep, and may stay so indefinitely until the producer inserts another item and signals the condition variable again. This means you can occasionally end up with particular items seeming to take a very long time to travel through the queue. If your queue is always constantly active this may not be a problem, but if it were always active you could probably forget the condition variable entirely.

AFAICT the solution is for the producer to behave the same as if it were working with a regular needs-locking queue. It should lock the mutex, insert into the lockless queue, signal the condition variable, unlock. However, the consumer should behave differently. When it wakes, it should unlock the mutex immediately instead of waiting until it's read the queue. Then it should pull as much of the queue as it can and process it. Finally, only when the consumer is thinking of going to sleep, should it lock the mutex, check if there's any data, then if so unlock and process it or if not then wait on the condition variable. This way the mutex is contended less often than it would be with a lockfull queue, but there's no risk of going to sleep with data still left on the queue.

Is this the best way to do it? Are there alternatives?

Note: By 'fastest' I really mean 'fastest without dedicating a core to checking the queue over and over,' but that wouldn't fit in the title ;p

One alternative: Go with the naive solution, but have the consumer wait on the condition variable with a timeout corresponding to the maximum latency you are willing to tolerate for an item traveling through the queue. If the desired timeout is fairly short though, it may be below the minimum wait time for your OS or still consume too much CPU.

score 1 · Accepted Answer

我假设您正在使用Dobbs 博士文章中的无锁单生产者单消费者队列- 或类似的东西 - 所以我将使用那里的术语。

在这种情况下，您在“AFAICT”开头的段落中建议的答案很好，但我认为可以稍微优化一下：

在消费者中-正如您所说，当消费者用尽队列并考虑休眠时（并且只有这样），它会锁定互斥锁，再次检查队列，然后要么
- 如果队列中有新项目，则释放互斥锁并继续工作
- 或阻塞条件变量（自然地，当它醒来以找到非空队列时释放互斥锁）。
在生产者中：
- 先取一份last，叫它saved_last
- 像往常一样添加项目new_item，然后复制divider指针，调用它saved_divider。
- 如果的值saved_divider等于new_item，即你刚刚插入的对象，那么你的对象已经被消费了，你的工作就完成了。
- 否则，如果的值saved_divider不等于saved_last，则不需要唤醒消费者。这是因为：
  - 在您添加新对象之后的某个时间，您知道divider尚未达到new_item或saved_last
  - 自从您开始插入以来，last只有这两个值
  - divider消费者只有在等于时才会停止last
  - 因此，消费者必须仍然醒着，并且会在睡觉前拿到您的新商品。
- 否则锁定互斥锁，向 condvar 发出信号，然后释放互斥锁。（在此处获取互斥锁可确保您在消费者注意到队列为空和实际阻塞 condvar 之间的时间内不会向 condar 发出信号。）

这确保了在消费者倾向于保持忙碌的情况下，当您知道消费者仍然醒着（而不是要睡觉）时避免锁定互斥锁。它还最大限度地减少了持有互斥锁的时间，以进一步减少争用的可能性。

上面的解释很罗嗦（因为我想解释它为什么起作用，而不仅仅是算法是什么），但由此产生的代码应该很简单。

当然，它是否真的值得做取决于很多事情，我鼓励你衡量性能是否对你至关重要。在大多数情况下，mutex/condvar 原语的良好实现在内部使用原子操作，因此它们通常只在需要时进行内核调用（最昂贵的位！） - 即，如果需要阻塞，或者肯定有一些线程等待 - 因此不调用互斥函数节省的时间可能仅相当于库调用的开销。

multithreading - What is the fastest race free method for polling a lockless queue?

1 回答 1

Related

Reference