python - 具有多 GPU 设置的 tf.data.Iterator

Question

我查看了cifar10 多 GPU 实现，以获取并行化我自己的 GPU 训练模型的灵感。

我的模型使用来自 TFRecords 的数据，这些数据通过tf.data.Iterator类进行迭代。因此，给定 2 个 GPU，我要做的是iterator.get_next()为每个 GPU 调用一次 CPU（例如两次）进行一些预处理、嵌入查找和其他与 CPU 相关的东西，然后将这两个批次输入 GPU。

伪代码：

with tf.device('/cpu:0'):
    batches = []
    for gpu in multiple_gpus:
        single_gpu_batch = cpu_function(iterator.get_next())
        batches.append(single_gpu_batch)

    ....................

for gpu, batch in zip(multiple_gpus, batches):
    with tf.device('/device:GPU:{}'.format(gpu.id):
        single_gpu_loss = inference_and_loss(batch)
        tower_losses.append(single_gpu_loss)
        ...........
        ...........

total_loss = average_loss(tower_losses)

问题是，如果只有 1 个或更少的示例要从数据中提取，并且我调用iterator.get_next()了两次，tf.errors.OutOfRange则会引发异常，并且第一次调用的数据iterator.get_next()（实际上没有失败，只有第二次调用）将永远不会通过GPU。

我曾想过在一次iterator.get_next()调用中绘制数据并稍后拆分，但是tf.split批量大小的失败不能被 GPU 的数量整除。

在多 GPU 设置中实现迭代器消费的正确方法是什么？

score 3 · Accepted Answer

我认为第二个建议是最简单的方法。为了避免最后一批的分裂问题，可以使用 ; 中的drop_remainder选项dataset.batch。或者如果您需要查看所有数据，那么一种可能的解决方案是根据绘制批次的大小显式设置维度，以便拆分操作永远不会失败：

dataset = dataset.batch(batch_size * multiple_gpus)
iterator = dataset.make_one_shot_iterator()
batches = iterator.get_next()

split_dims = [0] * multiple_gpus
drawn_batch_size = tf.shape(batches)[0]

要么以贪婪的方式，即batch_size在每个设备上拟合张量，直到用完

#### Solution 1 [Greedy]: 
for i in range(multiple_gpus):
  split_dims[i] = tf.maximum(0, tf.minimum(batch_size, drawn_batch_size))
  drawn_batch_size -= batch_size

或以更分散的方式确保每个设备至少获得一个样本（假设multiple_gpus< drawn_batch_size）

### Solution 2 [Spread]
drawn_batch_size -= - multiple_gpus
for i in range(multiple_gpus):
  split_dims[i] = tf.maximum(0, tf.minimum(batch_size - 1, drawn_batch_size)) + 1
  drawn_batch_size -= batch_size

## Split batches
batches = tf.split(batches, split_dims)

python - 具有多 GPU 设置的 tf.data.Iterator

1 回答 1

Related

Reference