multi-gpu - 带有 Tensorflow 2 的多 GPU 上的 Variable.assign(value)

Question

我有一个可以在单个 GPU 上完美运行的模型，如下所示：

alpha = tf.Variable(alpha,
                    name='ws_alpha',
                    trainable=False,
                    dtype=tf.float32,
                    aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
                   )

...
class CustomModel(tf.keras.Model):


    @tf.function
    def train_step(inputs):
        ...
        alpha.assign_add(increment)

...


model.fit(dataset, epochs=10)

但是，当我在多个 GPU 上运行时，分配并没有完成。它适用于两个训练步骤，然后在整个 epoch 中保持不变。

alpha 用于两层的加权和，例如out = a*Layer1 + (1-a)*Layer2。它不是可训练的参数，而是类似于step_count变量的东西。

有没有人在 tensorflow 2 的多 GPU 设置中分配单个值的经验？

将变量分配为：

with tf.device("CPU:0"):
    alpha = tf.Variable()

?

score 0 · Accepted Answer

简单修复，根据tensorflow 问题

alpha = tf.Variable(alpha,
                    name='ws_alpha',
                    trainable=False,
                    dtype=tf.float32,
                    aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA,
                    synchronization=tf.VariableSynchronization.ON_READ,
                   )

multi-gpu - 带有 Tensorflow 2 的多 GPU 上的 Variable.assign(value)

1 回答 1

Related

Reference