0

I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation happen?

The experiment was tried on Linux, CUDA 5.0, GTX 580. The program is only copying data from one array to another in kernel function: In main:

cudaMalloc((void**)&dev_a, NUM * sizeof(float));
cudaMalloc((void**)&dev_b, NUM * sizeof(float));
kernel<<<grid,block>>>((uint4 *)dev_a, (uint4 *)dev_b);

the kernel:

__global__ void kernel(uint4 *a, uint4 *b){
        unsigned int id = blockIdx.x * THREAD_NUM + threadIdx.x;
        for(unsigned int i = 0;i < LOOP/4;i++){
                b[id + i * GRID_NUM * THREAD_NUM] = a[id + i * GRID_NUM * THREAD_NUM];
        }
        return;
4

1 回答 1

1

Nsight EE 中的 Profiler 和 Linux 上的独立 Visual Profiler 基于相同的代码库。请确保:

  1. 您正在使用相同的可执行文件。
  2. 环境变量值没有区别(例如 LD_LIIBRARY_PATH)。

请注意,Nsight EE 启动 UI 可能会有些混乱。当您在调试调试版本后单击“配置文件”时,它实际上可能在调试可执行文件上运行配置文件,试图保留您可以设置的所有自定义启动设置(例如命令行参数、工作文件夹等)。在主菜单中单击Run -> Profile Configurations...以查看 Nsight 在分析应用程序时使用的设置。

于 2013-01-11T17:21:47.513 回答