16

如果我用一个块有尺寸的网格开始我的内核:

dim3 block_dims(16,16);

网格块现在是如何分裂成扭曲的?这种块的前两行形成一个扭曲,还是前两列,或者这是任意排序的?

假设 GPU 计算能力为 2.0。

4

2 回答 2

32

Threads are numbered in order within blocks so that threadIdx.x varies the fastest, then threadIdx.y the second fastest varying, and threadIdx.z the slowest varying. This is functionally the same as column major ordering in multidimensional arrays. Warps are sequentially constructed from threads in this ordering. So the calculation for a 2d block is

unsigned int tid = threadIdx.x + threadIdx.y * blockDim.x;
unsigned int warpid = tid / warpSize;

This is covered both in the programming guide and the PTX guide.

于 2011-05-30T14:23:14.357 回答
3

为了说明@talonmies 通过“Visual Studio WarpWatch”窗口对两个连续扭曲(dim3 block_dims(16,16);和 WarpSize = 32)的回答:

第一经线 第二经线

于 2019-02-26T15:08:41.447 回答