c++ - Cuda 错误 (209): cudaLaunchKernel 返回 cudaErrorNoKernelImageForDevice

Question

操作系统：CentOS 7 Cuda Toolkit 版本：11.0

Nvidia 驱动程序和 GPU 信息：

NVIDIA-SMI 450.51.05
驱动程序版本：450.51.05
CUDA 版本：11.0
GPU：Quadro M2000M

我对 cuda 编程非常陌生，因此非常感谢任何指导。我有一个非常简单的 cuda c++ 程序，它计算 GPU 上统一内存中两个数组的总和。但是，由于 cudaErrorNoKernelImageForDevice 错误，内核似乎无法启动。代码如下：

using namespace std;
#include <iostream>
#include <math.h>
#include <cuda_runtime_api.h>
__global__
void add(int n, float *x, float*y){
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}

int main() {
cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!

int N = 1<<20;
float *x, *y;

cudaMallocManaged((void**)&x, N*sizeof(float));
cudaMallocManaged((void**)&y, N*sizeof(float));

for(int i = 0; i < N; i++){
x[i] = 1.0f;
y[i] = 2.0f;
}

add<<<1, 1>>>(N, x, y);
cudaGetLastError();
    /**
     * This indicates that there is no kernel image available that is suitable
     * for the device. This can occur when a user specifies code generation
     * options for a particular CUDA source file that do not include the
     * corresponding device configuration.
     *
     *    cudaErrorNoKernelImageForDevice       =     209,
     */

cudaDeviceSynchronize();

float maxError = 0.0f;
for (int i = 0; i < N; i++){
maxError = fmax(maxError, fabs(y[i]-3.0f));
}

cudaFree(x);
cudaFree(y);

return 0;


}

score 1 · Accepted Answer

这里的错误是由于必须以生成的代码（PTX 或 SASS）与运行它的 GPU 兼容的方式编译 CUDA 内核。这是一个有很多细微差别的主题，因此请参阅此类问题（以及那里的链接）以获取更多背景信息。

当我们想要精确时，GPU 架构被称为计算能力。您可以通过 google 搜索或运行deviceQuery CUDA 示例代码来发现 GPU 的计算能力。计算能力表示为 (major).(minor)，因此类似于计算能力 5.2 或 7.0 等。

编译代码时，需要指定计算能力（如果没有，将隐含默认计算能力）。如果您在以与您的 GPU 匹配的方式编译时指定计算能力，那么一切都应该没问题。然而，较新/较高计算能力的代码通常不会在较旧/较低计算能力的 GPU 上运行。在这种情况下，您将看到类似您描述的错误：

cudaErrorNoKernelImageForDevice

209

“GPU 没有二进制文件”

或类似的。如果您没有进行正确的 CUDA 错误检查，您也可能根本看不到任何显式错误。解决方案是将编译时指定的计算能力与您打算运行的 GPU 相匹配。执行此操作的方法将根据您使用的工具链/IDE 而有所不同。对于基本的nvcc命令行用法：

nvcc -arch=sm_XY ...

将指定 XY 的计算能力

对于 Eclipse/Nsight Eclipse/Nsight Visual Studio，可以在项目属性中指定计算能力。根据工具的不同，它可以表示为开关值（例如compute_XY, sm_XY），也可以用数字表示为 XY

c++ - Cuda 错误 (209): cudaLaunchKernel 返回 cudaErrorNoKernelImageForDevice

1 回答 1

Related

Reference