26

我有以下代码。

x = keras.layers.Input(batch_shape = (None, 4096))
hidden = keras.layers.Dense(512, activation = 'relu')(x)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
mlp_model = keras.models.Model(input = [x], output = [predictions])
mlp_model.summary()

这是模型摘要:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_3 (InputLayer)             (None, 4096)          0                                            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 512)           2097664     input_3[0][0]                    
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 512)           2048        dense_1[0][0]                    
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 512)           0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 80)            41040       dropout_1[0][0]                  
====================================================================================================
Total params: 2,140,752
Trainable params: 2,139,728
Non-trainable params: 1,024
____________________________________________________________________________________________________

BatchNormalization (BN) 层的输入大小为 512。根据 Keras文档,BN 层的输出形状与输入相同,即 512。

那么BN层关联的参数个数怎么是2048呢?

4

2 回答 2

39

这 2048 个参数实际上是[gamma weights, beta weights, moving_mean(non-trainable), moving_variance(non-trainable)],每个有 512 个元素(输入层的大小)。

于 2017-07-31T16:04:54.613 回答
32

Keras 中的批量标准化实现了本文

正如您在那里所读到的,为了在训练期间使批量标准化工作,他们需要跟踪每个标准化维度的分布。为此,由于mode=0默认情况下您处于其中,因此它们会在前一层上为每个特征计算 4 个参数。这些参数确保您正确传播和反向传播信息。

所以4*512 = 2048,这应该回答你的问题。

于 2017-03-01T06:15:52.517 回答