2

我正在使用具有计算能力 1.2 的 GeForce 310 GPU 的nvcc选项编译 CUDA 程序。-arch=20 -code=20该程序似乎正常运行如下。

wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ make
nvcc -O --ptxas-options=-v 1D-EncodeV6.1.cu -o 1D-EncodeV6.1 -I../../NVIDIA_GPU_Computing_SDK/C/common/inc -I../../NVIDIA_GPU_Computing_SDK/shared/inc  -arch=compute_20 -code=sm_20 
ptxas info    : Compiling entry function '_Z6EncodePhPjS0_S_S_' for 'sm_20'
ptxas info    : Function properties for _Z6EncodePhPjS0_S_S_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 52 bytes cmem[0]
wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ ./1D-EncodeV6.1 
########################### Encoding start (loopCount=10)#######################
#p  n   size    averageTime(s)  averageThroughput(MB/s) errorRate(0~1)
#================= Encode on GPU v6.1 ===============
4   4   4   0.000294    0.051837    100.000000
#################### Encoding stop #########################

所以,我想知道:

  1. 为什么这个程序可以在带有与显卡计算能力 1.2 不匹配的nvcc选项的 GeForce 310 上运行?-arch=compute_20 -code=sm_20
  2. 如果期权的价值与-arch期权的价值不同,会发生什么-code

谢谢。

4

1 回答 1

3

A CUDA executable typically contains 2 types of program data: SASS code which is basically GPU machine code, and PTX which is an intermediate code (although it's pretty close to machine code). As long as PTX code is present in the executable, then if the driver decides that a proper SASS binary is not available for the GPU that the code will actually run on, it will do a "JIT-compile" step at application launch, to create the necessary binary code appropriate for the device in question, using the PTX code in the application package.

This is what is happening in your case.

If arch != code, then you're creating device code that architecturally conforms to the arch type, but is compiled to use machine level instructions that are associated with the code type. For example, if I compile for arch = 1.2 and code = 2.0, I cannot use double types (they will be demoted to float, because double is not supported in a 1.2 architecture) but the SASS machine code generated will be ready to execute on a cc 2.0 device, and will not require a JIT-compile step for that kind of device.

The NVCC manual has more information particularly the section on steering code generation.

于 2013-03-30T04:09:13.557 回答