cuda - atomicInc() 不工作

Question

我已经使用 atomicInc() 尝试了下面的程序。

__global__ void ker(int *count)
{
    int n=1;
    int x = atomicInc ((unsigned int *)&count[0],n);
    CUPRINTF("In kernel count is %d\n",count[0]);
}

int main()
{
    int hitCount[1];
    int *hitCount_d;

    hitCount[0]=1;
    cudaMalloc((void **)&hitCount_d,1*sizeof(int));

    cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);

    ker<<<1,4>>>(hitCount_d);

    cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);

    printf("count is %d\n",hitCount[0]);
  return 0;
}

输出是：

In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1

count is 1

我不明白为什么它没有增加。谁能帮忙

score 10 · Accepted Answer

参考文档，atomicInc这样做：

对于以下内容：

atomicInc ((unsigned int *)&count[0],n);

计算：

((count[0] >= n) ? 0 : (count[0]+1))

并将结果存储回count[0]

（如果您不确定?操作员是做什么的，请看这里）

由于您已经传递了n= 1，并且count[0]从 1 开始， atomicInc因此实际上永远不会将变量增加count[0]超过 1。

如果您想看到它的增量超过 1，请为传递一个更大的值n。

该变量n实际上充当递增过程的“翻转值”。当要递增的变量实际达到的值时n，nextatomicInc会将其重置为零。

尽管您没有问过这个问题，但您可能会问：“如果我达到了翻转值，为什么我永远看不到零值？”

要回答这个问题，您必须记住所有 4 个线程都在同步执行。它们全部 4 个atomicInc在执行后续打印语句之前执行指令。

因此，我们有一个count[0]从 1 开始的变量。

执行原子的第一个线程将其重置为零。
下一个线程将其增加到 1。
第三个线程将其重置为零。
第四个也是最后一个线程将其增加到 1。

然后所有 4 个线程都打印出该值。

作为另一个实验，尝试启动 5 个线程而不是 4 个线程，看看您是否可以预测打印输出的值是什么。

ker<<<1,5>>>(hitCount_d);

正如@talonmies 在评论中指出的那样，如果你把你的换成atomicInc一个atomicAdd：

int x = atomicAdd ((unsigned int *)&count[0],n);

你会得到你可能期待的结果。

cuda - atomicInc() 不工作

1 回答 1

Related

Reference