c - C11 内存栅栏使用

Question

即使对于一个简单的 2 线程通信示例，我也很难用 C11 atomic 和 memory_fence 风格来表达这一点，以获得正确的内存排序：

共享数据：

volatile int flag, bucket;

生产者线程：

while (true) {
   int value = producer_work();
   while (atomic_load_explicit(&flag, memory_order_acquire))
      ; // busy wait
   bucket = value;
   atomic_store_explicit(&flag, 1, memory_order_release);
}

消费者线程：

while (true) {
   while (!atomic_load_explicit(&flag, memory_order_acquire))
      ; // busy wait
   int data = bucket;
   atomic_thread_fence(/* memory_order ??? */);
   atomic_store_explicit(&flag, 0, memory_order_release);
   consumer_work(data);
}

据我了解，上述代码将正确排序存储桶-> 标志存储-> 标志加载-> 从存储桶加载。但是，我认为从存储桶加载和用新数据重新写入存储桶之间仍然存在竞争条件。要在桶读取之后强制执行订单，我想我需要atomic_thread_fence()在桶读取和以下 atomic_store 之间进行显式操作。不幸的是，似乎没有任何memory_order论据可以对前面的负载强制执行任何操作，甚至memory_order_seq_cst.

一个非常肮脏的解决方案可能是在消费者线程中重新分配bucket一个虚拟值：这与消费者只读概念相矛盾。

在旧的 C99/GCC 世界中，我可以使用__sync_synchronize()我认为足够强大的传统。

同步这种所谓的反依赖关系的更好的 C11 风格的解决方案是什么？

（当然我知道我应该更好地避免这种低级编码并使用可用的高级构造，但我想了解......）

score 3 · Accepted Answer

To force an order following the bucket-read, I guess I would need an explicit atomic_thread_fence() between the bucket read and the following atomic_store.

I do not believe the atomic_thread_fence() call is necessary: the flag update has release semantics, preventing any preceding load or store operations from being reordered across it. See the formal definition by Herb Sutter:

A write-release executes after all reads and writes by the same thread that precede it in program order.

This should prevent the read of bucket from being reordered to occur after the flag update, regardless of where the compiler chooses to store data.

That brings me to your comment about another answer:

The volatile ensures that there are ld/st operations generated, which can subsequently be ordered with fences. However, data is a local variable, not volatile. The compiler will probably put it in register, avoiding a store operation. That leaves the load from bucket to be ordered with the subsequent reset of flag.

It would seem that is not an issue if the bucket read cannot be reordered past the flag write-release, so volatile should not be necessary (though it probably doesn't hurt to have it, either). It's also unnecessary because most function calls (in this case, atomic_store_explicit(&flag)) serve as compile-time memory barriers. The compiler would not reorder the read of a global variable past a non-inlined function call because that function could modify the same variable.

I would also agree with @MaximYegorushkin that you could improve your busy-waiting with pause instructions when targeting compatible architectures. GCC and ICC both appear to have _mm_pause(void) intrinsics (probably equivalent to __asm__ ("pause;")).

score 1 · Accepted Answer

我同意@MikeStrobel 在他的评论中所说的话。

您不需要atomic_thread_fence()在这里，因为您的关键部分以获取语义开始并以释放语义结束。因此，关键部分中的读取不能在获取之前重新排序，而在发布后写入。这就是为什么volatile这里也没有必要。

此外，我看不出这里没有使用 (pthread) 自旋锁的原因。spinlock 为您执行了类似的忙自旋，但它也使用pause指令：

pause 内在函数用于自旋等待循环，处理器实现动态执行（尤其是乱序执行）。在自旋等待循环中，暂停内在函数提高了代码检测锁释放的速度，并提供了特别显着的性能增益。下一条指令的执行会延迟一段特定于实现的时间。PAUSE 指令不修改架构状态。对于动态调度，PAUSE 指令减少了退出自旋循环的代价。

score -1 · Accepted Answer

直接回答：

您的存储是 memory_order_release 操作意味着您的编译器必须在存储标志之前为存储指令发出内存栅栏。这是确保其他处理器在开始解释之前看到已发布数据的最终状态所必需的。所以，不，你不需要添加第二个栅栏。

长答案：

如上所述，发生的情况是编译器将您的atomic_...指令转换为栅栏和内存访问的组合；基本的抽象不是原子负载，而是内存栅栏。事情就是这样运作的，尽管新的 C++ 抽象会诱使您以不同的方式思考。而且我个人发现内存栅栏比 C++ 中人为的抽象更容易思考。

从硬件的角度来看，您需要确保的是您的加载和存储的相对顺序，即对桶的写入在标志写入生产者之前完成，并且标志的负载读取的值比桶中的消费者。

也就是说，您真正需要的是：

//producer
while(true) {
    int value = producer_work();
    while (flag) ; // busy wait
    atomic_thread_fence(memory_order_acquire);  //ensure that value is not assigned to bucket before the flag is lowered
    bucket = value;
    atomic_thread_fence(memory_order_release);  //ensure bucket is written before flag is
    flag = true;
}

//consumer
while(true) {
    while(!flag) ; // busy wait
    atomic_thread_fence(memory_order_acquire);  //ensure the value read from bucket is not older than the last value read from flag
    int data = bucket;
    atomic_thread_fence(memory_order_release);  //ensure data is loaded from bucket before the flag is lowered again
    flag = false;
    consumer_work(data);
}

请注意，这里的“生产者”和“消费者”标签具有误导性，因为我们有两个进程在打乒乓球，每个进程依次成为生产者和消费者；只是一个线程产生有用的值，而另一个产生“洞”来将有用的值写入......

atomic_thread_fence()就是你所需要的，因为它直接转换为atomic_...抽象下面的汇编指令，所以它保证是最快的方法。

c - C11 内存栅栏使用

3 回答 3

Related

Reference