tensorflow - CNN + LSTM 图像模型在验证数据集上表现不佳

Question

我的训练和损失曲线如下所示，是的，类似的图表已经收到诸如“经典过度拟合”之类的评论，我明白了。

我的模型如下所示，

input_shape_0 = keras.Input(shape=(3,100, 100, 1), name="img3")

model = tf.keras.layers.TimeDistributed(Conv2D(8, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)

model = tf.keras.layers.TimeDistributed(Conv2D(16, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)

model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)

model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(Flatten())(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.4))(model)

model = LSTM(16, kernel_regularizer=tf.keras.regularizers.l2(0.007))(model)

# model = Dense(100, activation="relu")(model)
# model = Dense(200, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.001))(model)
model = Dense(60, activation="relu")(model)
# model = Flatten()(model)

model = Dropout(0.15)(model)
out = Dense(30, activation='softmax')(model)

model = keras.Model(inputs=input_shape_0, outputs = out, name="mergedModel")

def get_lr_metric(optimizer):
    def lr(y_true, y_pred):
        return optimizer.lr
    return lr

opt = tf.keras.optimizers.RMSprop()
lr_metric = get_lr_metric(opt)
# merged.compile(loss='sparse_categorical_crossentropy', 
                 optimizer='adam', metrics=['accuracy'])
model.compile(loss='sparse_categorical_crossentropy', 
                optimizer=opt, metrics=['accuracy',lr_metric])
model.summary()

在上面的模型构建代码中，请将注释行视为我迄今为止尝试过的一些方法。

我已遵循作为此类问题的答案和评论给出的建议，但似乎没有一个对我有用。也许我错过了一些非常重要的东西？

我尝试过的事情：

辍学在不同的地方和不同的数额。
玩包含和排除密集层及其单位数量。
LSTM 层上的单元数尝试了不同的值（从低至 1 开始，现在为 16，我的性能最好。）
遇到权重正则化技术并尝试如上面的代码所示实现它们，因此尝试将其放在不同的层（我需要知道我需要使用什么技术而不是简单的反复试验 - 这是我做了什么，这似乎是错误的）
实现了学习率调度程序，我使用它来降低学习率，因为在一定数量的 epochs 之后随着 epochs 的进展。
尝试了两个 LSTM 层，第一个具有 return_sequences = true。

毕竟，我仍然无法克服过度拟合的问题。我的数据集被正确地洗牌并以 80/20 的火车/验证比进行划分。

数据增强是我发现通常建议的另一件事，但我还没有尝试，但我想看看我是否犯了一些错误，我可以纠正它，并暂时避免深入数据增强步骤。我的数据集具有以下大小：

Training images: 6780
Validation images: 1484

显示的数字是样本，每个样本将有 3 个图像。所以基本上，我一次输入 3 个法师作为我的时间分布的一个样本，CNN然后是其他层，如模型描述中所示。之后，我的训练图像是 6780 * 3，我的验证图像是 1484 * 3。每张图像都是 100 * 100，并且在通道 1 上。

我使用的优化器比我的测试RMS prop表现更好adam

更新

我在不同的地方尝试了一些不同的架构和一些 reularizations 和 dropout，现在我能够实现 59% 以下的 val_acc 是新模型。

#  kernel_regularizer=tf.keras.regularizers.l2(0.004)
# kernel_constraint=max_norm(3)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)

model = tf.keras.layers.TimeDistributed(Conv2D(64, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)

model = tf.keras.layers.TimeDistributed(Conv2D(128, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)


model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)

model = tf.keras.layers.TimeDistributed(GlobalAveragePooling2D())(model)

model = LSTM(128, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.040))(model)
model = Dropout(0.60)(model)
model = LSTM(128, return_sequences=False)(model)
model = Dropout(0.50)(model)
out = Dense(30, activation='softmax')(model)

score 1 · Accepted Answer

根据以下论文，有很多方法可以防止过度拟合：

辍学层（禁用随机神经元）。https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
输入噪声（例如图像上的随机高斯噪声）。https://arxiv.org/pdf/2010.07532.pdf
随机数据增强（例如旋转、移位、缩放等）。 https://arxiv.org/pdf/1906.11052.pdf
调整层数和单位。 https://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
正则化函数（例如 L1、L2 等） https://www.researchgate.net/publication/329150256_A_Comparison_of_Regularization_Techniques_in_Deep_Neural_Networks
Early Stopping：如果您注意到对于 N 个连续 epoch，您的模型的训练损失正在减少，但模型在验证数据集上表现不佳，那么停止训练是一个好兆头。
打乱训练数据或 K-Fold 交叉验证也是处理过度拟合的常用方法。

我发现了这个很棒的存储库，其中包含如何实现数据增强的示例： https ://github.com/kochlisGit/random-data-augmentations

此外，这里的这个存储库似乎有实现上述大多数方法的 CNN 示例： https ://github.com/kochlisGit/Tensorflow-State-of-the-Art-Neural-Networks

score 1 · Accepted Answer

尝试执行数据增强作为预处理步骤。缺乏数据样本会导致这样的曲线。您也可以尝试使用 k-fold 交叉验证。

score 0 · Accepted Answer

目标应该是让模型正确预测，而与样本中 3 个图像的排列顺序无关。

如果每个样本的图像顺序对训练并不重要，我认为您的模型会反过来，LSTM 接替的时间分布层考虑了三个图像的顺序。作为一种解决方案，首先，您可以通过重新排序每个样本的图像（= 增强数据）来添加图像。其次，尝试将三张图像视为具有三通道的一张图像，并删除 Timedistributed 层（我不确定三通道是否更有效，但您可以尝试一下）

tensorflow - CNN + LSTM 图像模型在验证数据集上表现不佳

3 回答 3

Related

Reference