tensorflow - TensorFlow 不会使用自定义 docker 映像和 python 3.6 检测 GPU

Question

（在向 tensorflow 提交问题之前在此处发布，正如他们的问题模板所建议的那样）

我正在尝试使用 python 3.6 构建 tensorflow docker 映像，我有以下内容Dockerfile

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        build-essential \
        curl \
        libfreetype6-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        rsync \
        software-properties-common \
        unzip \
        libcupti-dev \
 && add-apt-repository -y ppa:jonathonf/python-3.6 \
 && apt-get update \
 && apt-get install -y python3.6 python3.6-dev \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

RUN curl -O https://bootstrap.pypa.io/get-pip.py \
 && python3.6 get-pip.py \
 && rm get-pip.py

RUN python3.6 -m pip install --no-cache-dir -U ipython pip setuptools
RUN python3.6 -m pip install --no-cache-dir tensorflow
RUN ln -s /usr/bin/python3.6 /usr/bin/python

ENV LD_LIBRARY_PATH /usr/local/cuda-8.0/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV CUDA_HOME /usr/local/cuda-8.0

CMD ["ipython"]

我构建图像并运行强制脚本gpu:0：

nvidia-docker build -t tensorflow .
... (builds successfully)
nvidia-docker run --rm -v $PWD/test.py:/test.py tensorflow python /test.py
...
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'b': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
 [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2] values: [1 2][3]...>, _device="/device:GPU:0"]()]]

我已经用官方 gpu 图像尝试了相同的脚本，tensorflow/tensorflow:latest-gpu它工作正常。因此nvidia-docker，GPU 本身肯定适用于 tensorflow。

使用我构建的图像 nvidia cuda 和 cudnn 似乎安装正确：

nvidia-docker run --rm tensorflow bash -c "nvidia-smi; nvcc --version; cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2"
Sun Jul 23 22:50:11 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750     Off  | 0000:01:00.0      On |                  N/A |
| 21%   35C    P8     1W /  38W |    795MiB /   976MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
#define CUDNN_MAJOR      5
#define CUDNN_MINOR      1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

我究竟做错了什么？

（test.py只是）：

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

（我已经尝试过nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04使用tensorflow/tensorflow:latest-gpu但无济于事的基本图像）

score 1 · Accepted Answer

原来它就像安装tensorflow-gpunot一样简单tensorflow，奇怪的是 tensorflow 文档没有解释这一点，但基本上这是我很愚蠢。

tensorflow - TensorFlow 不会使用自定义 docker 映像和 python 3.6 检测 GPU

1 回答 1

Related

Reference