c++ - 编译包含动态并行性的代码失败

Question

我正在使用 CUDA 5.5 和计算能力为 3.5 的 NVDIA GeForce GTX 780 进行动态并行编程。我在内核函数中调用内核函数，但它给了我一个错误：

错误：从 __global__ 函数("kernel_5") 调用 __global__ 函数("kernel_6") 只允许在 compute_35 架构或更高版本上

我究竟做错了什么？

score 12 · Accepted Answer

You can do something like this

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt

or

If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.

nvcc -arch=sm_35 -dc simple1.cu 
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c 
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart

The same is explained in the cuda programming guide

score 7 · Accepted Answer

从 Visual Studio 2010：

1) View -> Property Pages
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true)
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib

score 4 · Accepted Answer

您需要让 nvcc 为您的设备生成 CC 3.5 代码。这可以通过将此选项添加到 nvcc 命令行来完成。

 -gencode arch=compute_35,code=sm_35

您可以找到有关动态并行的 CUDA 示例以获取更多详细信息。它们包含所有受支持操作系统的命令行选项和项目设置。

http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-

c++ - 编译包含动态并行性的代码失败

3 回答 3

Related

Reference