c - 在 cuda 中执行一次部分代码

Question

我想知道是否可以在 GPU 中做一个静态语句

这是我的代码的一部分

__global__ void run_state(float *oldv, float* newv, float* w, int t)
{
  int i = threadIdx.x;

  nextState(i, oldv, newv, w, t);


  newv[0]   = newv[1];
  newv[N+1] = newv[N];

}

我想知道是否 newv[0] = newv[1];并且newv[N+1] = newv[N];可以在nextState循环后执行并且只执行一次？

score 2 · Accepted Answer

我不清楚您到底想达到什么目标（按什么执行一次？），所以我将展示一些替代方案：

如果你想每个 block执行一次语句，你可以简单地通过测试线程索引来做到这一点：

__syncthreads();
if ((threadIdx.x | threadIdx.y | threadIdx.z) == 0) {
    // statements that are only executed once per block
}
__syncthreads();

如果您想在每次内核调用时执行一次语句，则需要更具体地说明该语句何时执行，因为内核的块执行没有特定的顺序。

上面示例的简单扩展产生了一个版本，其中语句在每次内核调用时执行一次，但在未指定的时间：

if ((threadIdx.x | threadIdx.y | threadIdx.z
     | blockIdx.x | blockIdx.y | blockIdx.z) == 0) {
    // statements that are executed once per kernel invocation,
    // at an unspecified time
}

可以使用全局内存中的原子操作指定语句何时执行，但会消耗额外的全局内存带宽。为了限制性能影响，通常每个块只执行一次这些原子操作是一个好主意。

如果您想在第一个块到达它时立即执行该语句，一个全局标志的简单测试就足够了：

__global__ volatile unsigned int statementHasExecuted;

...

__syncthreads();
if ((threadIdx.x | threadIdx.y | threadIdx.z) == 0) {
    unsigned int state = atomicMin((unsigned int*)&statementHasExecuted, 1);
    if (state == 0) {
        // statements that are executed once per kernel invocation,
        // as soon as the first block reaches the statement

        // now make the results visible to the other blocks:
        __threadfence();
        // and signal that we are done:
        atomicMin((unsigned int*)&statementHasExecuted, 2);
    } else {
        while (state < 2) {
            // busy wait until execution of statement in other block has finished:
            state = statementHasExecuted;
        }
    }
}
__syncthreads();

statementHasExecuted在每次内核启动之前都需要将其重置为零。

您所追求的可能是相反的情况，一段仅在所有其他块完成后执行的代码。编程指南的附录 B.5“内存栅栏功能”给出了示例代码。

（请注意，所有代码都已在浏览器中编写并且未经测试 - 它可能包含错误）。

score 0 · Accepted Answer

您指定一个线程来完成最后的工作。例如：

__global__ void run_state(float *oldv, float* newv, float* w, int t)
{
  int i = threadIdx.x;

  nextState(i, oldv, newv, w, t);

  if (i == 0) {
    newv[0]   = newv[1];
    newv[N+1] = newv[N];
  }
}

c - 在 cuda 中执行一次部分代码

2 回答 2

Related

Reference