2

我的模型是基于 CNN 的模型,具有多个 BN 层和 DO 层。所以最初,我不小心把 model.train() 放在循环之外,如下所示:

model.train()
for e in range(num_epochs):
    # train model
    model.eval()
    # eval model

作为记录,上面的代码训练良好并且在验证集上表现不错:

[CV:02][E:001][I:320/320] avg. Loss: 0.460897, avg. Acc: 0.742746, test. acc: 0.708046(max: 0.708046)
[CV:02][E:002][I:320/320] avg. Loss: 0.389883, avg. Acc: 0.798791, test. acc: 0.823563(max: 0.823563)
[CV:02][E:003][I:320/320] avg. Loss: 0.319034, avg. Acc: 0.825559, test. acc: 0.834914(max: 0.834914)
[CV:02][E:004][I:320/320] avg. Loss: 0.301322, avg. Acc: 0.834254, test. acc: 0.834052(max: 0.834914)
[CV:02][E:005][I:320/320] avg. Loss: 0.292184, avg. Acc: 0.839575, test. acc: 0.835201(max: 0.835201)
[CV:02][E:006][I:320/320] avg. Loss: 0.285467, avg. Acc: 0.842266, test. acc: 0.837931(max: 0.837931)
[CV:02][E:007][I:320/320] avg. Loss: 0.279607, avg. Acc: 0.844917, test. acc: 0.829885(max: 0.837931)
[CV:02][E:008][I:320/320] avg. Loss: 0.275252, avg. Acc: 0.846443, test. acc: 0.827874(max: 0.837931)
[CV:02][E:009][I:320/320] avg. Loss: 0.270719, avg. Acc: 0.848150, test. acc: 0.822989(max: 0.837931)

然而,在查看代码时,我意识到我犯了一个错误,因为上面的代码会在第一次迭代后关闭 BN 层和 DO 层。

所以,我在循环中移动了这条线: model.train() :

for e in range(num_epochs):
    model.train()
    #train model
    model.eval()
    #eval model

此时,模型的学习效果相对较差(看起来像模型过度拟合,您可以在以下输出中看到)。它具有更高的训练准确度,但在验证集上的准确度显着降低(考虑到 BN 和 DO 的通常影响,此时它开始变得很奇怪):

[CV:02][E:001][I:320/320] avg. Loss: 0.416946, avg. Acc: 0.750477, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.329121, avg. Acc: 0.798992, test. acc: 0.690948(max: 0.690948)
[CV:02][E:003][I:320/320] avg. Loss: 0.305688, avg. Acc: 0.829053, test. acc: 0.719540(max: 0.719540)
[CV:02][E:004][I:320/320] avg. Loss: 0.290048, avg. Acc: 0.840539, test. acc: 0.741954(max: 0.741954)
[CV:02][E:005][I:320/320] avg. Loss: 0.279873, avg. Acc: 0.848872, test. acc: 0.745833(max: 0.745833)
[CV:02][E:006][I:320/320] avg. Loss: 0.270934, avg. Acc: 0.854274, test. acc: 0.742960(max: 0.745833)
[CV:02][E:007][I:320/320] avg. Loss: 0.263515, avg. Acc: 0.856945, test. acc: 0.741667(max: 0.745833)
[CV:02][E:008][I:320/320] avg. Loss: 0.256854, avg. Acc: 0.858672, test. acc: 0.734483(max: 0.745833)
[CV:02][E:009][I:320/320] avg. Loss: 0.252013, avg. Acc: 0.861363, test. acc: 0.723707(max: 0.745833)
[CV:02][E:010][I:320/320] avg. Loss: 0.245525, avg. Acc: 0.865519, test. acc: 0.711494(max: 0.745833)

所以我心想:“我猜 BN 层和 DO 层对我的模型有负面影响”,因此将它们删除。然而,模型在移除 BN 和 DO 层后表现不佳(事实上,模型似乎没有学到任何东西):

[CV:02][E:001][I:320/320] avg. Loss: 0.552687, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503373, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502966, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502870, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502832, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:007][I:320/320] avg. Loss: 0.502800, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:008][I:320/320] avg. Loss: 0.502765, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)

在这一点上,我非常困惑。我更进一步,进行了另一个实验。我将 BN 层和 DO 层放回模型中并测试了以下内容:

for e in range(num_epochs):
    model.eval()
    # train model
    # eval model

效果不佳:

[CV:02][E:001][I:320/320] avg. Loss: 0.562196, avg. Acc: 0.744774, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506071, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502916, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502859, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502838, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)

我多次进行了上述实验,结果与我上面发布的输出相差不远。(我正在使用的数据非常简单)。

总而言之,该模型在非常特殊的环境中效果最佳。

  1. Batch Normalization 和 Dropout 添加到模型中。(这很好)。
  2. 仅在第一个时期使用 model.train() 训练模型。(奇怪..结合3)
  3. 在剩余的迭代中使用 model.eval() 训练模型。(也很奇怪)

老实说,我不会像上面那样设置培训程序(我认为没有人会),但出于某种原因它运作良好。有没有人经历过类似的事情?或者,如果您能指导我了解模型为什么会这样,将不胜感激!

提前致谢!!

4

0 回答 0