使用嵌套模型时load_weights()
,如果设置不一样save_weights()
,很容易出错。trainable
要解决该错误,请确保在调用之前冻结相同的图层model.load_weights()
。也就是说,如果权重文件在所有层都冻结的情况下保存,则过程将是:
- 重新创建模型
- 冻结所有图层
base_model
- 加载重量
- 解冻你想训练的那些层(在这种情况下,
base_model.layers[-26:]
)
例如,
base_model = ResNet50(include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))
for layer in base_model.layers:
layer.trainable = False
model.load_weights('all_layers_freezed.h5')
for layer in base_model.layers[-26:]:
layer.trainable = True
根本原因:
当您调用 时model.load_weights()
,(大致)通过以下步骤(在topology.pyload_weights_from_hdf5_group()
中的函数中)加载每一层的权重:
- 调用
layer.weights
以获取权重张量
- 将每个权重张量与 hdf5 文件中对应的权重值匹配
- 调用
K.batch_set_value()
以将权重值分配给权重张量
trainable
如果您的模型是嵌套模型,则由于步骤 1 ,您必须小心。
我将用一个例子来解释它。对于与上述相同的模型,model.summary()
给出:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 1, 1, 2048) 23587712
_________________________________________________________________
flatten_10 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_5 (Dense) (None, 80) 163920
=================================================================
Total params: 23,751,632
Trainable params: 11,202,640
Non-trainable params: 12,548,992
_________________________________________________________________
内部模型在权重加载期间ResNet50
被视为一层。model
加载图层resnet50
时,在第一步中,调用layer.weights
相当于调用base_model.weights
。模型中所有层的权重张量列表ResNet50
将被收集并返回。
现在的问题是,在构建权重张量列表时,可训练的权重将排在不可训练的权重之前。在Layer
类的定义中:
@property
def weights(self):
return self.trainable_weights + self.non_trainable_weights
如果所有层base_model
都被冻结,权重张量将按以下顺序排列:
for layer in base_model.layers:
layer.trainable = False
print(base_model.weights)
[<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
但是,如果某些层是可训练的,则可训练层的权重张量将位于冻结层的权重张量之前:
for layer in base_model.layers[-5:]:
layer.trainable = True
print(base_model.weights)
[<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'bn5c_branch2b/moving_mean:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2b/moving_variance:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
顺序的变化是为什么你得到一个关于张量形状的错误。hdf5 文件中保存的权重值与上述第 2 步中的错误权重张量匹配。冻结所有图层时一切正常的原因是因为您的模型检查点也被保存,所有图层都被冻结,因此顺序是正确的。
可能更好的解决方案:
您可以使用函数式 API 来避免嵌套模型。例如,以下代码应该可以正常工作:
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers:
layer.trainable = False
model.save_weights("all_nontrainable.h5")
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers[:-26]:
layer.trainable = False
model.load_weights("all_nontrainable.h5")