c++ - 线程之间共享变量是否存在编译器优化问题？

Question

考虑以下示例。目标是使用两个线程，一个用于“计算”一个值，一个用于消耗和使用计算值（我试图简化这一点）。计算线程通过使用条件变量向另一个线程发出信号，表明该值已计算并准备就绪，之后等待线程使用该值。

// Hopefully this is free from errors, if not, please point them out so I can fix
// them and we can focus on the main question
#include <pthread.h>
#include <stdio.h>

// The data passed to each thread. These could just be global variables.
typedef struct ThreadData
{
  pthread_mutex_t mutex;
  pthread_cond_t cond;
  int spaceHit;
} ThreadData;

// The "computing" thread... just asks you to press space and checks if you did or not
void* getValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);

  printf("Please hit space and press enter\n");
  data->spaceHit = getchar() == ' ';
  pthread_cond_signal(&data->cond);

  pthread_mutex_unlock(&data->mutex);

  return NULL;
}

// The "consuming" thread... waits for the value to be set and then uses it
void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit)
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (data->spaceHit)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");

  return NULL;
}

int main()
{
  // Boring main function. Just initializes things and starts the two threads.
  pthread_t threads[2];
  pthread_attr_t attr;
  ThreadData data;
  data.spaceHit = 0;

  pthread_mutex_init(&data.mutex, NULL);
  pthread_cond_init(&data.cond, NULL);

  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  pthread_create(&threads[0], &attr, watchValue, &data);
  pthread_create(&threads[1], &attr, getValue, &data);

  pthread_join(threads[0], NULL);
  pthread_join(threads[1], NULL);

  pthread_attr_destroy(&attr);
  pthread_mutex_destroy(&data.mutex);
  pthread_cond_destroy(&data.cond);

  return 0;
}

我的主要问题与编译器完成的潜在优化有关。是否允许编译器进行棘手的优化并“优化”程序流，以便发生以下情况：

void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit) // Here, it might remember the result of data->spaceHit
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (remember the old result of data->spaceHit without re-getting it)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");
  // The above if statement now may not execute correctly because it didn't
  // re-get the value of data->spaceHit, but "remembered" the old result
  // from the if statement a few lines above

  return NULL;
}

我有点偏执，编译器的静态分析可能会确定这data->spaceHit两个语句之间没有变化if，因此证明使用旧值data->spaceHit而不是重新获取新值是合理的。我对线程和编译器优化知之甚少，无法知道这段代码是否安全。是吗？

注意：我是用 C 语言编写的，并将其标记为 C 和 C++。我在 C++ 库中使用它，但由于我使用 C 线程 API（pthreads 和 Win32 线程）并且可以选择将 C 嵌入到 C++ 库的这一部分中，因此我将其标记为 C 和 C++ .

score 9 · Accepted Answer

不，编译器不允许在对ordata->spaceHit的调用中缓存的值。这些都被特别称为“函数[其]相对于其他线程同步内存”，它们必须充当编译器屏障。pthread_cond_wait()pthread_mutex_unlock()

对于要成为符合标准的 pthreads 实现的一部分的编译器，它不能在您给出的情况下执行该优化。

score 6 · Accepted Answer

一般来说，线程之间共享数据不仅存在编译器优化问题，而且当这些线程位于可以乱序执行指令的不同处理器上时，也会存在硬件优化问题。

但是，pthread_mutex_lockandpthread_mutex_unlock函数不仅必须克服编译器缓存优化，而且还必须克服任何硬件重新排序优化。如果线程 A 准备了一些共享数据，然后通过执行解锁“发布”它，这必须与其他线程保持一致。例如，不能在另一个处理器上出现锁已释放，但共享变量的更新尚未完成。所以函数必须执行任何必要的内存屏障。如果编译器可以围绕对函数的调用移动数据访问，或者在寄存器级别缓存内容以破坏一致性，那么所有这些都是徒劳的。

因此，从这个角度来看，您拥有的代码是安全的。但是，它还有其他问题。该pthread_cond_wait函数应始终在重新测试变量的循环中调用，因为任何原因都可能出现虚假唤醒。

条件的信号是无状态的，因此等待线程可以永远阻塞。仅仅因为您pthread_cond_signal在getValue输入线程中无条件调用并不意味着watchValue它将通过等待。有可能getValue先执行，并且spaceHit没有设置。然后watchValue进入互斥锁，看到它spaceHit是假的，并执行一个可能是无限期的等待。（具有讽刺意味的是，唯一可以挽救它的是虚假唤醒，因为没有循环。）

基本上，您似乎正在寻找的逻辑是一个简单的信号量：

// Consumer:
wait(data_ready_semaphore);
use(data);

// Producer:
data = produce();
signal(data_ready_semaphore);

在这种交互方式中，我们不需要互斥体，data->spaceHit在你的watchValue. 更具体地说，使用 POSIX 信号量语法：

// "watchValue" consumer
sem_wait(&ready_semaphore);
if (data->spaceHit)
  printf("You hit space!\n");
else
  printf("You did NOT hit space!\n");

// "getValue" producer
data->spaceHit = getchar() == ' ';
sem_post(&ready_semaphore);

也许您简化为示例的真实代码可以只使用信号量。

PS 也pthread_cond_signal不必在互斥体内部。它可能会调用操作系统，因此只需要少量机器指令来保护共享变量的互斥保护区域可能会爆炸到数百个机器周期，因为它包含信号调用。

score -1 · Accepted Answer

（稍后编辑：看起来后续答案为这个查询提供了更好的答案。我将把这个答案留在这里作为不回答问题的方法的参考。建议您是否推荐不同的方法。）

类型ThreadData本身不是易变的。

将其实例化为“数据”main()是易变的。指针 'data' ingetValue()和 watchValueValue() 也指向 'ThreadData' 类型的易失版本。

虽然我喜欢第一个答案的紧密性，重写

ThreadData data;  // main()
ThreadData* data; // getValue(), watchValueValue()

至

volatile ThreadData data;  // main()
volatile ThreadData* data; // getValue(), watchValueValue()
                           // Pointer `data` is not volatile, what it points to is volatile.

可能更好。它将确保对 ThreadData 成员的任何访问总是被重新读取而不是优化。如果您向中添加其他字段ThreadData，则同样受到保护。

c++ - 线程之间共享变量是否存在编译器优化问题？

3 回答 3

Related

Reference