cuda - 如何制作一个可从主机和设备调用的内核函数？

Question

以下试验提出了我的意图，但未能编译：

__host__ __device__ void f(){}

int main()
{
    f<<<1,1>>>();
}

编译器抱怨：

a.cu(5): error: a __device__ function call cannot be configured

1 error detected in the compilation of "/tmp/tmpxft_00001537_00000000-6_a.cpp1.ii".

希望我的陈述清楚，并感谢您的建议。

score 12 · Accepted Answer

您需要创建一个 CUDA 内核入口点，例如__global__函数。就像是：

#include <stdio.h>

__host__ __device__ void f() {
#ifdef __CUDA_ARCH__
    printf ("Device Thread %d\n", threadIdx.x);
#else
    printf ("Host code!\n");
#endif
}

__global__ void kernel() {
   f();
}

int main() {
   kernel<<<1,1>>>();
   if (cudaDeviceSynchronize() != cudaSuccess) {
       fprintf (stderr, "Cuda call failed\n");
   }
   f();
   return 0;
}

score -2 · Accepted Answer

你看的教程太老了，2008年？它可能与您使用的 CUDA 版本不兼容。

您可以使用__global__，这意味着__host__ __device__，这有效：

__global__ void f()
{
    const int tid = threadIdx.x + blockIdx.x * blockDim.x;
}

int main()
{
    f<<<1,1>>>();
}

cuda - 如何制作一个可从主机和设备调用的内核函数？

2 回答 2

Related

Reference