c++ - C++ 放大器原子

Question

我正在用 C++ AMP 重写一个算法，但遇到了原子写入的问题，更具体地说 atomic_fetch_add，这显然只适用于整数？

我需要以原子方式添加一个 double_4（或者如果我必须添加一个 float_4）。如何使用 C++ AMP 的原子实现这一点？

最好/唯一的解决方案真的有一个我的代码可以用来控制写入的锁变量吗？我实际上需要为一长串输出双打执行原子写入，所以我基本上需要为每个输出设置一个锁。

我已经考虑过平铺这个以获得更好的性能，但现在我只是在第一次迭代中。

编辑：感谢您已经给出的快速答案。不过，我对我的问题有一个快速更新。

我进行了以下锁定尝试，但似乎当经线中的一个线程越过锁时，同一经线中的所有其他线程都只是跟随。我期待第一个经线得到锁，但我一定错过了一些东西（请注意，自从我的 cuda 时代以来已经有好几年了，所以我刚刚变得愚蠢）

parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
{
   .....
   for (int j = 0; j < attracted.extent.size(); j++)
   {
      ...
      int lock = 0; //the expected lock value
      while (!atomic_compare_exchange(&locks[j], &lock, 1));
      //when one warp thread gets the lock, ALL threads continue on
      ...
      acceleration[j] += ...; //locked write
      locks[j] = 0; //leaving the lock again
   }
});

这不是一个大问题，因为我应该首先写入一个共享变量，并且只有在一个 tile 中的所有线程都完成后才将它写入全局内存，但我就是不理解这种行为。

score 2 · Accepted Answer

所有原子添加操作仅适用于整数类型。尽管对于 float_4（我假设这是 4 个浮点数），您可以使用 128 位 CAS（比较和交换）操作在没有锁定的情况下做您想做的事情，但是对于 double_4，您不需要 256 位 CAS 操作。你需要做的是有一个循环从内存中原子读取 float_4，以常规方式执行浮点添加，然后使用 CAS 测试和交换值，如果它是原始值（如果不是，则循环，即其他线程改变了读写之间的值）。请注意，128 位 CAS 仅适用于 64 位架构，并且您的数据需要正确对齐。

score 1 · Accepted Answer

如果关键代码很短，您可以使用原子操作创建自己的锁：

int lock = 1;

while(__sync_lock_test_and_set(&lock, 0) == 0) // trying to acquire lock
{
 //yield the thread or go to sleep
} 

//critical section, do the work

// release lock
lock = 1;

优点是您节省了操作系统锁的开销。

score 0 · Accepted Answer

其他人已经回答了这个问题，答案是您需要自己处理双原子。库中没有它的功能。

我还想详细说明我自己的编辑，以防其他人带着同样的失败锁来到这里。

在下面的例子中，我的错误在于没有意识到当交换失败时，它实际上改变了预期值！因此，第一个线程会期望锁为零并在其中写入 1。下一个线程期望 0 并且无法写入 1 - 但随后交换在保存预期值的变量中写入了 1。这意味着下次线程尝试进行交换时，它期望锁中的值为 1！它得到这个，然后它认为它得到了锁。

我完全不知道 &lock 在交换匹配失败时会收到 1！

parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
{
   .....
   for (int j = 0; j < attracted.extent.size(); j++)
   {
      ...
      int lock = 0; //the expected lock value

      **//note that, if locks[j]!=lock then lock=1
      //meaning that ACE will be true the next time if locks[j]==1
      //meaning the while will terminate even though someone else has the lock**
      while (!atomic_compare_exchange(&locks[j], &lock, 1));
      //when one warp thread gets the lock, ALL threads continue on
      ...
      acceleration[j] += ...; //locked write
      locks[j] = 0; //leaving the lock again
   }
});

似乎要解决这个问题

parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
{
   .....
   for (int j = 0; j < attracted.extent.size(); j++)
   {
      ...
      int lock = 0; //the expected lock value

      while (!atomic_compare_exchange(&locks[j], &lock, 1))
      {
          lock=0; //reset the expected value
      };
      //when one warp thread gets the lock, ALL threads continue on
      ...
      acceleration[j] += ...; //locked write
      locks[j] = 0; //leaving the lock again
   }
});

c++ - C++ 放大器原子

3 回答 3

Related

Reference