如果我有,是否仍然可以在某种多 gpu 设置中运行训练Peer access not supported between device ordinals
?(据我所知,GPU 是“未连接的”)例如通过在 GPU 上分别计算每个批次,然后在 CPU 上合并,因为我理解这是方式带有 Caffe 后端的 DIGITS 中的“批量累积”工作。
原始输出:
2017-05-10 15:27:54.360688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 1
2017-05-10 15:27:54.360949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 2
2017-05-10 15:27:54.361504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 3
2017-05-10 15:27:54.361738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 0
2017-05-10 15:27:54.361892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 2
2017-05-10 15:27:54.362065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 3
2017-05-10 15:27:54.362263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 0
2017-05-10 15:27:54.362485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 1
2017-05-10 15:27:54.362693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 3
2017-05-10 15:27:54.362885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 0
2017-05-10 15:27:54.362927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 1
2017-05-10 15:27:54.362967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 2
2017-05-10 15:27:54.364638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 2 3
2017-05-10 15:27:54.364668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y N N N
2017-05-10 15:27:54.364687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1: N Y N N
2017-05-10 15:27:54.364702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 2: N N Y N
2017-05-10 15:27:54.364717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 3: N N N Y