2

出于好奇,我比较了具有单个时间步长的堆叠 LSTM 神经网络与具有 tanh 激活函数的 MLP,认为它们具有相同的性能。用于比较的架构如下,它们是在相同的回归问题数据集上训练的(损失函数为 MSE):

model.add(Dense(50, input_dim=num_features, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(150, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(50, activation = 'tanh'))
model.add(Dense(1))


model.add(LSTM(50, return_sequences=True, input_shape=(None, num_features)))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(150, return_sequences=True))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))

令人惊讶的是,LSTM 模型的损失比 MLP 下降得快得多:

MLP loss:
Epoch: 1
Training Loss: 0.011504
Validation Loss: 0.010708
Epoch: 2
Training Loss: 0.010739
Validation Loss: 0.010623
Epoch: 3
Training Loss: 0.010598
Validation Loss: 0.010189
Epoch: 4
Training Loss: 0.010046
Validation Loss: 0.009651
Epoch: 5
Training Loss: 0.009305
Validation Loss: 0.008502
Epoch: 6
Training Loss: 0.007388
Validation Loss: 0.004334
Epoch: 7
Training Loss: 0.002576
Validation Loss: 0.001686
Epoch: 8
Training Loss: 0.001375
Validation Loss: 0.001217
Epoch: 9
Training Loss: 0.000921
Validation Loss: 0.000916
Epoch: 10
Training Loss: 0.000696
Validation Loss: 0.000568
Epoch: 11
Training Loss: 0.000560
Validation Loss: 0.000479
Epoch: 12
Training Loss: 0.000493
Validation Loss: 0.000451
Epoch: 13
Training Loss: 0.000439
Validation Loss: 0.000564
Epoch: 14
Training Loss: 0.000402
Validation Loss: 0.000478
Epoch: 15
Training Loss: 0.000377
Validation Loss: 0.000366
Epoch: 16
Training Loss: 0.000351
Validation Loss: 0.000240
Epoch: 17
Training Loss: 0.000340
Validation Loss: 0.000352
Epoch: 18
Training Loss: 0.000327
Validation Loss: 0.000203
Epoch: 19
Training Loss: 0.000311
Validation Loss: 0.000323
Epoch: 20
Training Loss: 0.000299
Validation Loss: 0.000264
LSTM loss:
Epoch: 1
Training Loss: 0.011345
Validation Loss: 0.010634
Epoch: 2
Training Loss: 0.008128
Validation Loss: 0.003692
Epoch: 3
Training Loss: 0.001488
Validation Loss: 0.000668
Epoch: 4
Training Loss: 0.000440
Validation Loss: 0.000232
Epoch: 5
Training Loss: 0.000260
Validation Loss: 0.000160
Epoch: 6
Training Loss: 0.000200
Validation Loss: 0.000137
Epoch: 7
Training Loss: 0.000165
Validation Loss: 0.000093
Epoch: 8
Training Loss: 0.000140
Validation Loss: 0.000104
Epoch: 9
Training Loss: 0.000127
Validation Loss: 0.000139
Epoch: 10
Training Loss: 0.000116
Validation Loss: 0.000091
Epoch: 11
Training Loss: 0.000106
Validation Loss: 0.000095
Epoch: 12
Training Loss: 0.000099
Validation Loss: 0.000082
Epoch: 13
Training Loss: 0.000091
Validation Loss: 0.000135
Epoch: 14
Training Loss: 0.000085
Validation Loss: 0.000099
Epoch: 15
Training Loss: 0.000082
Validation Loss: 0.000055
Epoch: 16
Training Loss: 0.000079
Validation Loss: 0.000062
Epoch: 17
Training Loss: 0.000075
Validation Loss: 0.000045
Epoch: 18
Training Loss: 0.000073
Validation Loss: 0.000121
Epoch: 19
Training Loss: 0.000069
Validation Loss: 0.000045
Epoch: 20
Training Loss: 0.000065
Validation Loss: 0.000052

在 100 个 epoch 之后,MLP 的验证损失降低到大约 1e-4,但 LSTM 的损失降低到大约 1e-5。对我来说,这两种架构有何不同并没有多大意义,因为 LSTM 单元没有使用之前时间步长的任何内存。此外,MLP 的训练速度大约是 LSTM 的 3 倍。有人能解释一下它背后的数学原理吗?

4

0 回答 0