(在向 tensorflow 提交问题之前在此处发布,正如他们的问题模板所建议的那样)
我正在尝试使用 python 3.6 构建 tensorflow docker 映像,我有以下内容Dockerfile
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
curl \
libfreetype6-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
rsync \
software-properties-common \
unzip \
libcupti-dev \
&& add-apt-repository -y ppa:jonathonf/python-3.6 \
&& apt-get update \
&& apt-get install -y python3.6 python3.6-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3.6 get-pip.py \
&& rm get-pip.py
RUN python3.6 -m pip install --no-cache-dir -U ipython pip setuptools
RUN python3.6 -m pip install --no-cache-dir tensorflow
RUN ln -s /usr/bin/python3.6 /usr/bin/python
ENV LD_LIBRARY_PATH /usr/local/cuda-8.0/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV CUDA_HOME /usr/local/cuda-8.0
CMD ["ipython"]
我构建图像并运行强制脚本gpu:0
:
nvidia-docker build -t tensorflow .
... (builds successfully)
nvidia-docker run --rm -v $PWD/test.py:/test.py tensorflow python /test.py
...
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'b': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2] values: [1 2][3]...>, _device="/device:GPU:0"]()]]
我已经用官方 gpu 图像尝试了相同的脚本,tensorflow/tensorflow:latest-gpu
它工作正常。因此nvidia-docker
,GPU 本身肯定适用于 tensorflow。
使用我构建的图像 nvidia cuda 和 cudnn 似乎安装正确:
nvidia-docker run --rm tensorflow bash -c "nvidia-smi; nvcc --version; cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2"
Sun Jul 23 22:50:11 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A |
| 21% 35C P8 1W / 38W | 795MiB / 976MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
我究竟做错了什么?
(test.py
只是):
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
(我已经尝试过nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
使用tensorflow/tensorflow:latest-gpu
但无济于事的基本图像)