cuda - 更改 CUDA 中的 arch 参数使我使用更多寄存器

Question

我一直在我的 Tesla K20m 上编写内核，当我使用 -Xptas=-v 编译软件时，我得到以下结果：

ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_10'
ptxas info    : Used 8 registers, 80 bytes smem, 8 bytes cmem[1]

如您所见，只使用了 8 个寄存器，但是，如果我提到参数 -arch=sm_35，我的内核执行时间会急剧增加，并且使用的寄存器数量也会增加，我想知道为什么

nvcc mysoftware.cu -Xptxas=-v -arch=sm_35 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_35'
ptxas info    : Function properties for _Z9searchKMPPciPhiPiS1_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 21 registers, 16 bytes smem, 368 bytes cmem[0]

由于在多本书中都提到使用正确的卡架构可以提高性能，我想知道为什么我的卡会急剧下降。

谢谢。

编辑：类似的问题和答案：寄存器和共享内存取决于编译计算能力？

score 3 · Accepted Answer

Compiling with sm_20 and above enables IEEE math and ABI compliance. These two options can increase register count and decrease performance. These two options can be disabled.

cuda - 更改 CUDA 中的 arch 参数使我使用更多寄存器

1 回答 1

Related

Reference