debugging - CUDA 在 Nsight 调试中无法查看共享内存值

Question

我一直在努力解决一个我似乎无法找到解决方案的问题。问题是当我尝试在 Visual Studio 2008 下使用 Nvidia Nsight 调试我的 CUDA 代码时，我在使用共享内存时得到了奇怪的结果。

我的代码是：

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   /**** Breakpoint (1) here ****/
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   return v;
}

当我到达断点 1 并检查 Visual Studio Watch 窗口中的共享内存时，只有数组的前 8 个值发生变化，其他值保持为空。我希望所有前 64 人都这样做。 Visual Studio 中的观察窗口

我认为这可能与所有未同时执行的经线有关。所以我尝试同步它们。我在里面添加了这段代码integrate()

template<typename T>
__device__
T integrate()
{
   /* Old code is still here */

   __syncthreads();
   /**** Breakpoint (2) here ****/
   if(index < 64 && blockIdx.x==0) {
      T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window
      v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence
   }
return v;
}

但问题仍然存在。此外， tmp 中的其余值0与监视窗口窗体 VS 所指示的不同。 Nsight 的观察窗口

我必须提到，它需要很多步骤才能跨越__syncthreads()，所以当我到达它时，我只是跳转到断点 2。到底发生了什么！？

编辑有关系统/启动配置的信息

系统

名称 Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
架构 x86
频率 2.666 MHz
核心数 2
页面大小 4.096
总物理内存 3.582,00 MB
可用物理内存 1.983,00 MB
版本名称 Windows 7 Ultimate
版本号 6.1.7600

设备GeForce 9500 GT

驱动程序版本 301.42
驱动器型号 WDDM
CUDA 设备索引 0
GPU 系列 G96
计算能力 1.1
SM 数量 4
帧缓冲区物理大小 (MB) 512
帧缓冲带宽 (GB/s) 16
帧缓冲总线宽度（位） 128
帧缓冲区位置专用
图形时钟 (Mhz) 812
内存时钟 (Mhz) 500
处理器时钟 (Mhz) 1625
内存类型 DDR2

IDE

微软 Visual Studio 团队系统 2008
NVIDIA Nsight Visual Studio 版本，版本 2.2 内部版本号 2.2.0.12255

编译器命令

1> "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" -G -gencode=arch=compute_10,code=\"sm_10,compute_10\" --machine 32 -ccbin "C:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g -D_DEBUG -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd "-I"inc" -I"C: \Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" -maxrregcount=0 --compile -o "Debug/process_f2f.cu.obj" process_f2f.cu

启动配置。共享内存大小似乎并不重要。我试过几个版本。我合作最多的一个是：

共享内存 2048 字节
网格/块大小：{101、101、1}、{16、16、1}

score 1 · Accepted Answer

您是否尝试在分配值后放置__syncthreads()？

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   __syncthreads();
   /**** Breakpoint (1) here ****/
   return v;
}

并尝试查看此断点处的值。

debugging - CUDA 在 Nsight 调试中无法查看共享内存值

1 回答 1

Related

Reference