2

我终于设法让 CUDA 在带有 Kesla T80 的 Microsoft Azure 服务器上工作。现在我需要让 cuDNN 工作,但 TensorFlow 不会加载它。

这是来自 TensorFlow 的消息:

>>> import tensorflow as tf
>>> tf.Session()

2017-04-27 13:05:51.476251: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476306: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476338: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476366: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476394: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:58.164781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID ad52:00:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-04-27 13:05:58.164822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-27 13:05:58.164835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-27 13:05:58.164853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: ad52:00:00.0)
<tensorflow.python.client.session.Session object at 0x7fc3c76c0050>

所以我看到没有加载 cuDNN 库。

我有正确的文件/cuda-8.0/include//cuda-8.0/lib64/

$ ls /usr/local/cuda-8.0/include/ | grep "cudnn"
cudnn.h

$ ls /usr/local/cuda-8.0/lib64/ | grep "cudnn"
libcudnn.so
libcudnn.so.5
libcudnn.so.5.1.10
libcudnn_static.a

我的~/.bashrc文件有正确的路径

export CUDA_HOME=/usr/local/cuda8.0
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}

编辑 更改.bashrc为:

export CUDA_HOME=/usr/local/cuda-8.0
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
export PATH=${CUDA_HOME}/include:${PATH}

仍然没有运气。

来自 nvidia-smi 的输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | AD52:00:00.0     Off |                    0 |
| N/A   71C    P0    61W / 149W |      0MiB / 11439MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

我正在使用 tensorflow 版本 1.1.0、Ubuntu 16.04 和 CUDA 8.0。

编辑

所以我只是试图删除 cudnn 文件并加载 tensorflow,这给了我一个错误。一些东西找不到 libcuddn.so.5。所以我认为它会加载它,但我的印象是,如果使用 cuDNN,TensorFlow 会连同“libcuddn.so 加载成功”一起写一些东西。

4

0 回答 0