c++ - 无锁队列——单生产者，多消费者

Question

我正在寻找一种方法来实现支持单个生产者和多个消费者的无锁队列数据结构。我看过 Maged Michael 和 Michael Scott (1996) 的经典方法，但他们的版本使用链表。我想要一个使用有界循环缓冲区的实现。使用原子变量的东西？

附带说明一下，我不确定为什么这些经典方法是为需要大量动态内存管理的链表设计的。在多线程程序中，所有的内存管理例程都是序列化的。通过将无锁方法与动态数据结构结合使用，我们不是在破坏无锁方法的好处吗？

我正在尝试在英特尔 64 位架构上使用 pthread 库在 C/C++ 中对此进行编码。

谢谢你，希里什

score 9 · Accepted Answer

The use of a circular buffer makes a lock necessary, since blocking is needed to prevent the head from going past the tail. But otherwise the head and tail pointers can easily be updated atomically. Or in some cases the buffer can be so large that overwriting is not an issue. (in real life you will see this in automated trading systems, with circular buffers sized to hold X minutes of market data. If you are X minutes behind, you have wayyyy worse problems than overwriting your buffer).

When I implemented the MS queue in C++, I built a lock free allocator using a stack, which is very easy to implement. If I have MSQueue then at compile time I know sizeof(MSQueue::node). Then I make a stack of N buffers of the required size. The N can grow, i.e. if pop() returns null, it is easy to go ask the heap for more blocks, and these are pushed onto the stack. Outside of the possibly blocking call for more memory, this is a lock free operation.

Note that the T cannot have a non-trivial dtor. I worked on a version that did allow for non-trivial dtors, that actually worked. But I found that it was easier just to make the T a pointer to the T that I wanted, where the producer released ownership, and the consumer acquired ownership. This of course requires that the T itself is allocated using lockfree methods, but the same allocator I made with the stack works here as well.

In any case the point of lock-free programming is not that the data structures themselves are slower. The points are this:

lock free makes me independent of the scheduler. Lock-based programming depends on the scheduler to make sure that the holders of a lock are running so that they can release the lock. This is what causes "priority inversion" On Linux there are some lock attributes to make sure this happens
If I am independent of the scheduler, the OS has a far easier time managing timeslices, and I get far less context switching
it is easier to write correct multithreaded programs using lockfree methods since I dont have to worry about deadlock , livelock, scheduling, syncronization, etc This is espcially true with shared memory implementations, where a process could die while holding a lock in shared memory, and there is no way to release the lock
lock free methods are far easier to scale. In fact, I have implemented lock free methods using messaging over a network. Distributed locks like this are a nightmare

That said, there are many cases where lock-based methods are preferable and/or required

when updating things that are expensive or impossible to copy. Most lock free methods use some sort of versioning, i.e. make a copy of the object, update it, and check if the shared version is still the same as when you copied it, then make the current version you update version. Els ecopy it again, apply the update, and check again. Keep doing this until it works. This is fine when the objects are small, but it they are large, or contain file handles, etc then not recommended
Most types are impossible to access in a lock free way, e.g. any STL container. These have invariants that require non atomic access , for example assert(vector.size()==vector.end()-vector.begin()). So if you are updating/reading a vector that is shared, you have to lock it.

score 6 · Accepted Answer

这是一个古老的问题，但没有人提供可接受的解决方案。因此，我为可能正在搜索的其他人提供此信息。

本网站：http ://www.1024cores.net

提供了一些非常有用的无锁/无等待数据结构，并提供了详尽的解释。

您正在寻找的是针对读写器问题的无锁解决方案。

见：http ://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem

score 1 · Accepted Answer

对于传统的单块循环缓冲区，我认为这根本无法通过原子操作安全地完成。你需要一口气读完这么多。假设您有一个具有以下结构的结构：

uint8_t* buf;
unsigned int size; // Actual max. buffer size
unsigned int length; // Actual stored data length (suppose in write prohibited from being > size)
unsigned int offset; // Start of current stored data

在阅读时，您需要执行以下操作（这就是我实现它的方式，您可以交换一些步骤，就像我稍后将讨论的那样）：

检查读取长度是否不超过存储长度
检查偏移量+读取长度是否不超过缓冲区边界
读出数据
增加偏移，减少长度

你当然应该做同步（如此原子）来完成这项工作？实际上将步骤 1 和 4 合并到一个原子步骤中，或者澄清一下：同步执行：

检查read_length，这可能是这样的read_length=min(read_length,length);
用 read_length 减少长度：length-=read_length
从偏移量获取本地副本unsigned int local_offset = offset
用 read_length 增加偏移量：offset+=read_length

之后，您可以从 local_offset 开始执行 memcpy（或其他），检查您的读取是否超过循环缓冲区大小（分成 2 个 memcpy），...。这是“相当”线程安全的，您的 write 方法仍然可以覆盖您正在读取的内存，因此请确保您的缓冲区确实足够大以最大限度地减少这种可能性。

现在，虽然我可以想象您可以将 3 和 4 （我猜这就是他们在链表案例中所做的）甚至 1 和 2 组合在原子操作中，但我看不到您在一个原子操作中完成整个交易:)。

但是，您可以尝试放弃“长度”检查您的消费者是否非常聪明并且总是知道要阅读什么。您还需要一个新的 woffset 变量，因为 (offset+length)%size 确定写入偏移量的旧方法不再适用。请注意，这与链表的情况很接近，实际上您总是从链表中读取一个元素（= 固定的、已知大小）。同样在这里，如果你把它做成一个循环链表，你可以读到很多，也可以写到你当时正在阅读的位置！

最后：我的建议是，只需使用锁，我使用 CircularBuffer 类，对于读取和写入完全安全）用于实时 720p60 视频流媒体，并且我完全没有因锁定而出现速度问题。

score 0 · Accepted Answer

这是一个古老的问题，但没有人提供准确回答它的答案。鉴于（几乎）同一个问题的搜索结果仍然很高，应该有一个答案，因为存在一个。

可能有不止一种解决方案，但这里有一个实现： https ://github.com/tudinfse/FFQ 自述文件中引用的会议论文详细介绍了算法。

c++ - 无锁队列——单生产者，多消费者

4 回答 4

Related

Reference