我有一个带有 2 个 nvidia GPU 的 ubuntu 16.04 安装:
GPU 0: GeForce GT 610 (UUID: GPU-710e856e-358f-7b7d-95b7-e4eae7037c1f)
GPU 1: GeForce GTX TITAN X (UUID: GPU-5eacd6f3-f9e4-5795-c75c-26e34ced55ce)
nvidia-smi
输出:
Sun Jun 10 17:21:47 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 610 Off | 00000000:02:00.0 N/A | N/A |
| 40% 49C P8 N/A / N/A | 133MiB / 1985MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 00000000:03:00.0 Off | N/A |
| 22% 50C P8 15W / 250W | 2MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
我已按照https://www.tensorflow.org/install/install_linux#InstallingAnaconda中的步骤为 GPU 安装基于 anaconda 的 tensoflow。但是,如果我启动 TF 会话,我会收到以下错误:
Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:55)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> x = tf.Variable( "Hello..!" )
>>> sess = tf.Session()
2018-06-10 17:16:07.662527: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-10 17:16:07.843402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:03:00.0
totalMemory: 11.92GiB freeMemory: 11.80GiB
2018-06-10 17:16:07.880682: E tensorflow/core/common_runtime/direct_session.cc:154] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/miniconda2/envs/tf-gpu/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1560, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/opt/miniconda2/envs/tf-gpu/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 633, in __init__
self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
我错过了什么?如何摆脱这个错误?