cuda - 是否可以启动在运行时定义网格大小/块大小的 cuda 内核？

Question

我想知道是否可以启动 cuda 内核，以便在运行时而不是像往常一样在编译时提及网格/块大小。

任何有关这方面的帮助都将非常宝贵。

score 3 · Accepted Answer

在 CUDA 应用程序中，为网格指定固定大小从来都不是很有用。大多数情况下，块大小是固定的，网格大小保持动态并根据输入数据大小而变化。考虑以下向量加法的示例。

__global__ void kernel(float* a, float* b, float* c, int length)
{
    int tid = blockIdx.x * blockDim.x + threadIdx.x;

    //Bound checks inside the kernel
    if(tid<length)
       c[tid] = a[tid] + b[tid];
}

int addVectors(float* a, float* b, float* c, int length)
{
   //a, b, c are allocated on the device

   //Fix the block size to an appropriate value
   dim3 block(128);

   dim3 grid;
   grid.x = (length + block.x - 1)/block.x;

   //Grid size is dependent on the length of the vector. 
   //Total number of threads are rounded up to the nearest multiple of block size.
   //It means total number of threads are at least equal to the length of the vector.

   kernel<<<grid,block>>>(a,b,c,length);

   return 0;
}

score 2 · Accepted Answer

Cuda 内核和设备函数可以使用 blockDim.{ x,y,z} 访问块配置以及 gridDim.{ x,y,z} 访问网格配置。如果您有一个可以处理各种配置的内核/设备功能，那么您需要做的就是启动一个内核（myKernel<<<dimGrid,dimBlock>>>）与任何东西dimGrid，或者dimBlock您在运行时选择。我不认为这有什么不寻常的。

cuda - 是否可以启动在运行时定义网格大小/块大小的 cuda 内核？

2 回答 2

Related

Reference