tensorflow - 有没有办法在张量流中剪辑中间爆炸梯度

Question

问题：一个很长的 RNN 网络

N1 -- N2 -- ... --- N100

对于像一样的优化器AdamOptimizer，compute_gradient()它将为所有训练变量提供梯度。

但是，它可能会在某些步骤中爆炸。

但是如何剪辑那些中间的呢？

一种方法可能是从“N100 --> N99”手动执行反向传播，剪裁渐变，然后是“N99 --> N98”等等，但这太复杂了。

所以我的问题是：有没有更简单的方法来剪辑中间渐变？（当然，严格来说，它们不再是数学意义上的梯度）

score 2 · Accepted Answer

2

@tf.custom_gradient
def gradient_clipping(x):
  return x, lambda dy: tf.clip_by_norm(dy, 10.0)

于 2019-09-11T21:59:53.503 回答

score 0 · Accepted Answer

您可以使用custom_gradient装饰器来制作tf.identity剪辑中间爆炸渐变的版本。

``` 从 tensorflow.contrib.eager.python 导入 tfe

@tfe.custom_gradient def gradient_clipping_identity(tensor, max_norm): result = tf.identity(tensor)

def grad(dresult): return tf.clip_by_norm(dresult, max_norm), None

返回结果，毕业```

然后gradient_clipping_identity像通常使用身份一样使用，您的渐变将在向后传递中被剪裁。

2 回答 2