cuda - cuda-gdb 从四个可用的支持 CUDA 的设备中只看到一个功能最差的设备

Question

有四种支持 CUDA 的设备可用：

teslabot$ ./deviceQuery | grep -i "device [0-9]\|capability"
Device 0: "Tesla C2050 / C2070"
  CUDA Capability Major/Minor version number:    2.0
Device 1: "Tesla C2050 / C2070"
  CUDA Capability Major/Minor version number:    2.0
Device 2: "GeForce GTX 295"
  CUDA Capability Major/Minor version number:    1.3
Device 3: "GeForce GTX 295"
  CUDA Capability Major/Minor version number:    1.3

cuda-dbg只看到其中一个：

teslabot$ cuda-gdb vector_add
NVIDIA (R) CUDA Debugger
4.0 release
Portions Copyright (C) 2007-2011 NVIDIA Corporation
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
[...]
(cuda-gdb) break vector_add_gpu
Breakpoint 1 at 0x400ddb: file vector_add.cu, line 7.
(cuda-gdb) run
[...]
(cuda-gdb) info cuda devices
  Dev Description SM Type SMs Warps/SM Lanes/Warp Max Regs/Lane Active SMs Mask
*   0       gt200   sm_13  30       32         32           128 0x00000001

我已经检查了代码构建-gencode arch=compute_20,code=sm_20在所述机器上编译时没有错误，并且在编译sm_20后printf在 CUDA 内核中使用时可以正常工作。

我怎样才能cuda-gdb看到所有设备（也许除了一个用于图形的设备......尽管在这种情况下我通过 SSH 远程登录），或者至少一个 Tesla / sm_20 设备？

当通过将环境变量设置为仅包含“0,1”（即仅使特斯拉可见）来遵循Michael Foukarakis 响应中的建议时，运行后出现以下错误：CUDA_VISIBLE_DEVICESinfo cuda devices

(cuda-gdb) info cuda devices
fatal:  All CUDA devices are used for X11 and cannot be used while debugging. (error code = 24)

如何检查 X11 (X.Org) 使用了哪些设备，以及如何使 X Window System 使用 GeForce 而不是 Tesla？

score 2 · Accepted Answer

您能否确保CUDA_VISIBLE_DEVICES环境变量包含您要使用的所有设备，例如：

$ ./deviceQuery -noprompt | egrep "^Device"
Device 0: "Tesla C2050"
Device 1: "Tesla C1060"
Device 2: "Quadro FX 3800"

通过设置变量，您可以只使它们中的一部分对运行时可见：

$ export CUDA_VISIBLE_DEVICES="0,2"
$ ./deviceQuery -noprompt | egrep "^Device"
Device 0: "Tesla C2050"
Device 1: "Quadro FX 3800"

cuda - cuda-gdb 从四个可用的支持 CUDA 的设备中只看到一个功能最差的设备

1 回答 1

Related

Reference