cuda - CUDA：3D 网格中的全局唯一线程索引

Question

正如问题所述，如果我有一个 3D 网格块，获取一个线程的全局唯一索引的公式是什么？

让我们将块本身保持为一维。

score 7 · Accepted Answer

// unique block index inside a 3D block grid
const unsigned long long int blockId = blockIdx.x //1D
        + blockIdx.y * gridDim.x //2D
        + gridDim.x * gridDim.y * blockIdx.z; //3D

// global unique thread index, block dimension uses only x-coordinate
const unsigned long long int threadId = blockId * blockDim.x + threadIdx.x;

score 1 · Accepted Answer

派对有点晚了，但这是我通常以非常通用的方式处理这个问题的方式，因为它支持任意数量和大小的块（甚至是 2D）：

// Compute the offset in each dimension
const size_t offsetX = blockDim.x * blockIdx.x + threadIdx.x;
const size_t offsetY = blockDim.y * blockIdx.y + threadIdx.y;
const size_t offsetZ = blockDim.z * blockIdx.z + threadIdx.z;

// Make sure that you are not actually outs
if (offsetX >= sizeX || offsetY >= sizeY || offsetZ >= sizeZ)
  return;

// Compute the linear index assuming that X,Y then Z memory ordering
const size_t idx = offsetZ * sizeX * sizeY + offsetY * sizeX + offsetX;

请注意，我不是 CUDA 忍者。

score 0 · Accepted Answer

@djmj 的现有答案很好，但是一些重新格式化使它更清楚发生了什么（至少对我的大脑来说 - 这对 CUDA 来说是新的）：

long blockId = blockIdx.z  *  gridDim.x*gridDim.y
             + blockIdx.y  *  gridDim.x
             + blockIdx.x;
long threadsPerBlock = blockDim.x;
long i = blockId * threadsPerBlock + threadIdx.x;

blockId是完整 z维度“切片”（2D 网格）中的块的总和，加上最终（不完整）切片的完整行中的块，加上该（不完整）切片的最后（不完整）行中的块.

“完成”是指当前 (x, y, z) 块“之前”的块（关于我们将它们求和以确定整体块 id 的方式）。

cuda - CUDA：3D 网格中的全局唯一线程索引

3 回答 3

Related

Reference