情况
我有一个 2 gpu 服务器(Ubuntu 12.04),在那里我用 GTX 670 切换了 Tesla C1060。比我在 4.2 上安装了 CUDA 5.0。之后,我为 simpleMPI 编译了所有示例 execpt,没有错误。但是当我运行时,./devicequery
我收到以下错误消息:
foo@bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
我试过的
为了解决这个问题,我尝试了支持 CUDA 的设备推荐的所有想法,但无济于事:
/dev/nvidia*
在那里并且权限是666(crw-rw-rw-)和所有者root:rootfoo@bar-serv2:/dev$ ls -l nvidia* crw-rw-rw- 1 root root 195, 0 Oct 24 18:51 nvidia0 crw-rw-rw- 1 root root 195, 1 Oct 24 18:51 nvidia1 crw-rw-rw- 1 root root 195, 255 Oct 24 18:50 nvidiactl
我尝试使用 sudo 执行代码
CUDA 5.0同时安装驱动和库
PS这里是lspci | grep -i 英伟达:
foo@bar-serv2:/dev$ lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 670] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation G94 [Quadro FX 1800] (rev a1)
[更新]
foo@bar-serv2:~/NVIDIA_CUDA-5.0_Samples/bin/linux/release$ nvidia-smi -a
NVIDIA: API mismatch: the NVIDIA kernel module has version 295.59,
but this NVIDIA driver component has version 304.54. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
Failed to initialize NVML: Unknown Error
如果我使用 CUDA 5.0 安装程序同时安装驱动程序和库,那怎么可能。旧的 4.2 版本会不会乱七八糟?