windows - cudaGetDeviceCount 返回 1 而不是 2

Question

我有一个由 2 个 Tesla M2050 组成的 gpu 集群，当我执行我的代码时，cudaGetDeviceCount 只返回 1。如果我尝试使用 cudaSetDevice 设置设备 1，它会给我这个错误：无效的设备序号。在 Windows 的设备管理器中列出了这两个设备。如果需要，这是我的源代码

cutilSafeCall(cudaGetDeviceCount(&num_devices));

for (device = 0; device < num_devices; device++) {
      cudaDeviceProp properties;
      cudaGetDeviceProperties(&properties, device);
      printf("Device ID:\t%d\n", device);
      printf("Device Name:\t%s\n", properties.name );
      printf("Global memory:\t%d\n", properties.totalGlobalMem );
      printf("Constant memory:\t%d\n", properties.totalConstMem );
      printf("Warp size:\t%d\n", properties.warpSize );
}
devs=0;
ParseArguments(argc, argv);
cutilSafeCall(cudaSetDevice(devs));

任何帮助，将不胜感激

编辑：deviceQuery.exe 的输出

deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "Tesla M2050"   
CUDA Driver Version: 5.50
CUDA Runtime Version:                          4.20   
CUDA Capability Major/Minor version number:    2.0  
...
...

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.50, CUDA Runtime Vers ion = 4.20, NumDevs = 1, Device = Tesla M2050


PASSED

Press <Enter> to Quit...
-----------------------------------------------------------

score 1 · Accepted Answer

如果您在单个节点中有两个 CUDA GPU 并且 deviceQuery 只报告一个，那么请考虑以下可能性：

通过运行 nvidia-smi 检查两个 GPU 是否正常运行，如果只显示一个，则检查它是否正确插入。
检查环境变量 CUDA_VISIBLE_DEVICES 未设置。

windows - cudaGetDeviceCount 返回 1 而不是 2

1 回答 1

Related

Reference