描述 我使用 2 个 GPU GTX 1080 TI、11GB 和 CUDA/cuDNN 9.1 版的问题。操作系统上的 cudnn 7.1 (Debian 9)。
我想在两个不同的 GPU 中运行两个模型进行预测以加快运行时间(通过在两个 GPU 中创建两个会话实例)。第一个模型对第一个 GPU 进行计算,第二个模型通过执行特定设备的每个会话在第二个 GPU 中进行计算,类似于 python "with tf.device('/gpu:0')"
源代码
int GPUID = std::stoi(params->getGpuDeviceStr());
setenv("CUDA_VISIBLE_DEVICES", "", GPUID);
std::cout << "Initial visible_device_list : "<<session_options.config.gpu_options().visible_device_list() << std::endl;
session_options.config.mutable_gpu_options()->set_allow_growth(true);
session_options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(
params->getGpuMemoryRatio());
输出
2018-04-30 10:18:56.625199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1208] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:09:00.0
totalMemory: 10,91GiB freeMemory: 10,75GiB
2018-04-30 10:18:56.750435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1208] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:42:00.0
totalMemory: 10,91GiB freeMemory: 10,42GiB
2018-04-30 10:18:56.751296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1223] Device peer to peer matrix
2018-04-30 10:18:56.751324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1229] DMA: 0 1
2018-04-30 10:18:56.751332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1239] 0: Y Y
2018-04-30 10:18:56.751337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1239] 1: Y Y
2018-04-30 10:18:56.751345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1308] Adding visible gpu devices: 0, 1
2018-04-30 10:18:57.110046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10055 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-04-30 10:18:57.110819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10050 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:42:00.0, compute capability: 6.1)
Running tensorflow in Version 1.5.0
讨论
通过创建两个线程在两个 GPU 中运行两个模型来同时运行两个模型,我看不到任何耗时的改进。在两个 GPU 中运行两个模型所消耗的时间与仅使用一个 GPU 运行的时间大致相同。