cuda - 无法在 cuda 内核函数中使用 printf

Question

似乎printf在 cuda 代码的内核中不起作用

#include "Common.h"
#include<cuda.h>
#include <stdio.h>

__device__ __global__ void Kernel(float *a_d , float *b_d ,int size)
{
    int idx = threadIdx.x ;
    int idy = threadIdx.y ;
    //Allocating memory in the share memory of the device
    __shared__ float temp[16][16];

    //Copying the data to the shared memory
    temp[idy][idx] = a_d[(idy * (size+1)) + idx] ;


    printf("idx=%d, idy=%d, size=%d\n", idx, idy, size);


    for(int i =1 ; i<size ;i++) {
            if((idy + i) < size) { // NO Thread divergence here
                    float var1 =(-1)*( temp[i-1][i-1]/temp[i+idy][i-1]);
                    temp[i+idy][idx] = temp[i-1][idx] +((var1) * (temp[i+idy ][idx]));
            }
            __syncthreads(); //Synchronizing all threads before Next iterat ion
    }
    b_d[idy*(size+1) + idx] = temp[idy][idx];
}

编译时，它说：

 error: calling a host function("printf") from a __device__/__global__ function("Kernel") is not allowed

cuda版本是4

score 7 · Accepted Answer

引用 CUDA 编程指南“格式化输出仅受计算能力 2.x 及更高版本的设备支持”。有关更多信息，请参阅编程指南。

计算能力 < 2.x 的设备可以使用 cuPrintf。

如果您使用的是 2.x 及更高版本的设备并且您正在尝试使用 printf，请确保您已指定 arch=sm_20（或更高版本）。默认值为 sm_10，它没有足够的特性来支持 printf。

NVIDIA 为 CUDA 提供了三个源代码级调试器。在检查变量时，您可能会发现这些比 printf 更有用。- Nsight Visual Studio 版 CUDA 调试器 - Nsight Eclipse 版 CUDA 调试器 - cuda-gdb

score 4 · Accepted Answer

您需要使用 cuPrintf，如本例所示。请注意，printf 是一种非常有限的调试方式，Nsight 或 Nsight eclipse 版本的 IDE 要好得多。

cuda - 无法在 cuda 内核函数中使用 printf

2 回答 2

Related

Reference