numpy - 神经网络权重以线性单位爆炸

Question

我目前正在用 numpy 在 Python 中实现一个简单的神经网络和反向传播算法。我已经使用中心差异测试了我的反向传播方法，结果梯度是相等的。

但是，网络无法逼近简单的正弦曲线。该网络有一个具有 tanh 激活函数的隐藏层（100 个神经元）和一个具有线性激活函数的输出层。每个单元也有一个偏置输入。训练是通过简单的梯度下降来完成的，学习率为 0.2。

问题出在梯度上，随着每个时代变大，梯度会变大，但我不知道为什么？此外，如果我降低学习率，问题不会改变。

编辑：我已将代码上传到 pastebin：http://pastebin.com/R7tviZUJ

score 1 · Accepted Answer

您可以尝试两件事，也许可以结合使用：

使用较小的学习率。如果它太高，您可能会在当前方向上超出最小值很多，因此您的权重将不断变大。
使用较小的初始权重。这与第一项有关。较小的学习率也可以解决这个问题。

score 1 · Accepted Answer

我有一个类似的问题（使用不同的库，DL4J），即使在非常简单的目标函数的情况下也是如此。就我而言，问题原来是成本函数。当我从负对数可能性更改为泊松或 L2 时，我开始获得不错的结果。（一旦我加入指数学习率衰减，我的结果就会好得多。）

score 0 · Accepted Answer

A too big learning rate can fail to converge, and even DIVERGE, that is the point.

The gradient could diverge for this reason: when exceeding the position of the minima, the resulting point could not only be a bit further, but could even be at a greater distance than initially, but the other side. Repeat the process, and it will continue to diverge. in other words, the variation rate around the optimal position could be just to big compared to the learning rate.

Source: my understanding of the following video (watch near 7:30). https://www.youtube.com/watch?v=Fn8qXpIcdnI&list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4&index=10

score 0 · Accepted Answer

看起来你不使用正则化。如果你训练你的网络足够长的时间，它将开始学习精确的数据而不是抽象的模式。

有几种方法可以规范您的网络，例如：停止训练、对大梯度投入高成本或更复杂的方法，例如鸡蛋脱落。如果您搜索网络/书籍，您可能会发现很多选项。

numpy - 神经网络权重以线性单位爆炸

4 回答 4

Related

Reference