1

我想知道在分布式训练时如何在 TensorFlow 中应用梯度裁剪。这是我的代码:

    @lazy_property
    def optimize(self):
        # train_vars = ...
        optimizer = tf.train.AdamOptimizer(self._learning_rate)
        self.syn_op = tf.train.SyncReplicasOptimizer(optimizer,
                                                     replicas_to_aggregate=self.gradient_merge,
                                                     total_num_replicas=self.worker_count,
                                                     use_locking=True)
        self.sync_replicas_hook = self.syn_op.make_session_run_hook(is_chief=self.is_chief)
        return self.syn_op.minimize(self.cost, var_list=train_vars, global_step=self.global_step)

我读过这个答案:如何在 TensorFlow 中应用渐变剪裁。这是答案中渐变剪辑的代码:

        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        gvs = optimizer.compute_gradients(cost)
        capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
        train_op = optimizer.apply_gradients(capped_gvs)

我应该在哪里更改以在我的情况下使用它?

4

0 回答 0