c++ - 封装 CUDA 内核的最佳方式是什么？

Question

我正在尝试使 CUDA 项目尽可能接近 OO 设计。目前，我找到的解决方案是使用 Struct 封装数据，并且对于需要一些 GPU 处理的每个方法，都需要实现 3 个函数：

对象将调用的方法。
一个__ global __函数，它将调用该结构的__ device __方法。
结构内的__ device __方法。

我会给你一个例子。假设我需要实现一种方法来初始化结构内的缓冲区。它看起来像这样：

struct Foo
{
   float *buffer;
   short2 buffer_resolution_;
   short2 block_size_;
   __device__ initBuffer()
   {
      int x = blockIdx.x * blockDim.x + threadIdx.x;
      int y = blockIdx.y * blockDim.y + threadIdx.y;
      int plain_index = (y * buffer_resolution.x) + x;
      if(plain_index < buffer_size)
         buffer[plain_index] = 0;
   }
   void init(const short2 &buffer_resolution, const short2 &block_size)
   {
       buffer_resolution_ = buffer_resolution;
       block_size_ = block_size;
       //EDIT1 - Added the cudaMalloc
       cudaMalloc((void **)&buffer_, buffer_resolution.x * buffer_resolution.y);
       dim3 threadsPerBlock(block_size.x, block_size.y);
       dim3 blocksPerGrid(buffer_resolution.x/threadsPerBlock.x, buffer_resolution.y/threadsPerBlock.y)
       initFooKernel<<<blocksPerGrid, threadsPerBlock>>>(this);
   }
}

__global__ initFooKernel(Foo *foo)
{
   foo->initBuffer();
}

我需要这样做，因为看起来我不能在结构中声明一个__ global __。我通过查看一些开源项目了解到这种方式，但是实现三个函数来实现每个封装的 GPU 方法看起来很麻烦。所以，我的问题是：这是最好的/唯一的方法吗？这甚至是一种有效的方法吗？

EDIT1：在调用 initFooKernel 之前，我忘记将 cudaMalloc 分配给缓冲区。修复。

score 3 · Accepted Answer

目标是制作使用 CUDA 的类，而它们从外部看起来像普通类吗？

如果是这样，为了扩展 O'Conbhui 所说的内容，您可以为 CUDA 功能创建 C 样式调用，然后创建一个包装这些调用的类。

因此，在 .cu 文件中，您可以放置纹理引用、内核、调用内核的 C 风格函数以及分配和释放 GPU 内存的 C 风格函数的定义。在您的示例中，这将包括一个调用初始化 GPU 内存的内核的函数。

然后，在相应的 .cpp 文件中，导入带有 .cu 文件中函数声明的标头并定义类。在构造函数中，您调用分配 CUDA 内存并设置其他 CUDA 资源（例如纹理）的 .cu 函数，包括您自己的内存初始化函数。在析构函数中，您调用释放 CUDA 资源的函数。在您的成员函数中，您调用调用内核的函数。

c++ - 封装 CUDA 内核的最佳方式是什么？

1 回答 1

Related

Reference