我正在从deeplearning.ai
课程中学习神经网络中的正则化。在 dropout 正则化中,教授说如果应用了 dropout,计算的激活值将小于不应用 dropout 时(测试时)。所以我们需要扩展激活以保持测试阶段更简单。
我理解这个事实,但我不明白缩放是如何完成的。这是一个用于实现反向 dropout 的代码示例。
keep_prob = 0.8 # 0 <= keep_prob <= 1
l = 3 # this code is only for layer 3
# the generated number that are less than 0.8 will be dropped. 80% stay, 20% dropped
d3 = np.random.rand(a[l].shape[0], a[l].shape[1]) < keep_prob
a3 = np.multiply(a3,d3) # keep only the values in d3
# increase a3 to not reduce the expected value of output
# (ensures that the expected value of a3 remains the same) - to solve the scaling problem
a3 = a3 / keep_prob
在上面的代码中,为什么激活被除以0.8
或将节点保留在层中的概率(keep_prob
)?任何数字示例都会有所帮助。