python - keras 模型中损失函数的奇怪行为，具有预训练的卷积基础

Question

我正在尝试在 Keras 中创建一个模型，以从图片中进行数值预测。我的模型有densenet121卷积基础，顶部有几个附加层。除了最后两个层之外的所有层都设置为layer.trainable = False。我的损失是均方误差，因为它是一个回归任务。在训练期间，我得到loss: ~3，而对同一批数据的评估给出loss: ~30：

model.fit(x=dat[0],y=dat[1],batch_size=32)

纪元 1/1 32/32 [===============================] - 0s 11ms/step - loss: 2.5571

model.evaluate(x=dat[0],y=dat[1])

32/32 [===============================] - 2s 59ms/步 29.276123046875

我在训练和评估期间提供完全相同的 32 张图片。我还使用预测值计算损失y_pred=model.predict(dat[0])，然后使用 numpy. 结果与我从评估中得到的结果相同（即 29.276123 ...）。

有人建议这种行为可能是由于BatchNormalization卷积基础中的层（在 github 上讨论）。当然，BatchNormalization我的模型中的所有层也已设置layer.trainable=False为。也许有人遇到过这个问题并想出了解决方案？

score 12 · Accepted Answer

Looks like I found the solution. As I have suggested the problem is with BatchNormalization layers. They make tree things

subtract mean and normalize by std
collect statistics on mean and std using running average
train two additional parameters (two per node).

When one sets trainable to False, these two parameters freeze and layer also stops collecting statistic on mean and std. But it looks like the layer still performs normalization during training time using the training batch. Most likely it's a bug in keras or maybe they did it on purpose for some reason. As a result the calculations on forward propagation during training time are different as compared with prediction time even though the trainable atribute is set to False.

There are two possible solutions i can think of:

To set all BatchNormalization layers to trainable. In this case these layers will collect statistics from your dataset instead of using pretrained one (which can be significantly different!). In this case you will adjust all the BatchNorm layers to your custom dataset during the training.
Split the model in two parts model=model_base+model_top. After that, use model_base to extract features by model_base.predict() and then feed these features into model_top and train only the model_top.

I've just tried the first solution and it looks like it's working:

model.fit(x=dat[0],y=dat[1],batch_size=32)

Epoch 1/1
32/32 [==============================] - 1s 28ms/step - loss: **3.1053**

model.evaluate(x=dat[0],y=dat[1])

32/32 [==============================] - 0s 10ms/step
**2.487905502319336**

This was after some training - one need to wait till enough statistics on mean and std are collected.

Second solution i haven't tried yet, but i'm pretty sure it's gonna work since forward propagation during training and prediction will be the same.

Update. I found a great blog post where this issue has been discussed in all the details. Check it out here

score 2 · Accepted Answer

但是 dropout 层通常会产生相反的效果，使评估的损失小于训练期间的损失。

不必要！尽管在 dropout 层中，一些神经元被丢弃了，但请记住，输出会根据 dropout 率进行缩减。在推理时间（即测试时间）中，dropout 被完全删除，并且考虑到您只训练了一个时期的模型，您看到的行为可能会发生。不要忘记，由于您只训练了一个 epoch 的模型，因此只有一部分神经元被丢弃在 dropout 层中，但它们都在推理时出现。

如果您继续对模型进行更多时期的训练，您可能会期望训练损失和测试损失（在相同数据上）变得或多或少相同。

自己试验一下：只需将trainableDropout 层的参数设置为False，看看是否会发生这种情况。

看到一个时期的训练后，训练损失不等于同一批次数据的评估损失，人们可能会感到困惑（就像我一样）。这并不特定于具有Dropout或BatchNormalization层的模型。考虑这个例子：

from keras import layers, models
import numpy as np

model = models.Sequential()
model.add(layers.Dense(1000, activation='relu', input_dim=100))
model.add(layers.Dense(1))

model.compile(loss='mse', optimizer='adam')
x = np.random.rand(32, 100)
y = np.random.rand(32, 1)

print("Training:")
model.fit(x, y, batch_size=32, epochs=1)

print("\nEvaluation:")
loss = model.evaluate(x, y)
print(loss)

输出：

Training:
Epoch 1/1
32/32 [==============================] - 0s 7ms/step - loss: 0.1520

Evaluation:
32/32 [==============================] - 0s 2ms/step
0.7577340602874756

那么，如果它们是根据相同的数据计算的，为什么损失会不同，即0.1520 != 0.7577？

如果你问这个，那是因为你和我一样，没有引起足够的重视：那0.1520是更新模型参数之前的损失（即在做反向传播或反向传播之前）。0.7577是模型权重更新后的损失。即使使用的数据相同，计算这些损失值时模型的状态也不相同（另一个问题：那么为什么反向传播后损失增加了？这仅仅是因为您只训练了一个 epoch因此权重更新还不够稳定）。

为了确认这一点，您还可以使用与验证数据相同的数据批次：

model.fit(x, y, batch_size=32, epochs=1, validation_data=(x,y))

如果您使用上面修改过的行运行上面的代码，您将得到如下输出（显然，确切的值可能对您不同）：

Training:
Train on 32 samples, validate on 32 samples
Epoch 1/1
32/32 [==============================] - 0s 15ms/step - loss: 0.1273 - val_loss: 0.5344

Evaluation:
32/32 [==============================] - 0s 89us/step
0.5344240665435791

您会看到验证损失和评估损失完全相同：这是因为验证是在 epoch 结束时执行的（即当模型权重已经更新时）。

python - keras 模型中损失函数的奇怪行为，具有预训练的卷积基础

2 回答 2

Related

Reference