添加 dropout 层使 val 损失保持低于训练损失,在此期间存在恒定的泛化差距是否例外?
这是架构:
tf.keras.layers.CuDNNLSTM(1024,input_shape=(9,41),return_sequences=True) ,
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.CuDNNLSTM(512, return_sequences=True),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.CuDNNLSTM(256),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.Dense(3, activation=tf.nn.softmax)