keras - 为什么我会根据预测的顺序从 Keras LSTM 网络得到完全不同的预测？

Question

我有一个构建 LSTM 模型的脚本，适合训练数据，预测一些测试数据。（只是为了对火车数据进行有趣的绘图预测，因为它们应该接近火车数据，只是为了知道我的模型是否构建良好）

1）第一个问题是，对测试和训练数据的预测完全不同，这取决于我是先预测训练还是先测试。

2）第二个问题可能与第一个问题相关，所以每次我运行我的脚本时，对测试数据的预测都是完全不同的。我知道神经网络具有某种随机性，但正如您在我的结果图中看到的那样，它完全不同：

编辑1：我尝试按照评论中的建议设置'stateful = False'但没有成功。

edit2：我更新了脚本和绘图，并在新代码中提供了一些基本的正弦波样本数据。即使在那个简单的例子中，问题仍然存在

我得到一个输入信号 X 作为正弦波，具有 100 个时间步长和随机幅度和频率。我的目标 y 与 X 相关（在每个时间步）并且是 - 在这种情况下 - 也是一个正弦波。我的数据的形状是

X_train.shape = (100, 1, 1)
y_train.shape = (100,)
X_test.shape = (100, 1, 1)
y_test.shape = (100,)

我正在使用 LSTM 网络尝试拟合完整的正弦波，因此批量大小 = 100，并预测测试信号的每个单点，因此预测的批量大小 = 1。此外，我正在手动重置 LSTM 的状态在每个时代之后，如此处所述： https ://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/

为了建立我的网络，我遵循了这里提到的“keras-rules”： Delayed echo of sin - cannot reproduce Tensorflow result in Keras

我知道解决问题的基本方法，就像这里建议的那样： Wrong predictions with LSTM Neural Network but nothing for me.

我很感激这方面的任何帮助，以及提出更好的问题，以防我做错了什么，因为这是我在堆栈上的第一篇文章。

谢谢大家！这是我的代码示例：

import numpy as np
import matplotlib.pyplot as plt
from keras import models, layers, optimizers
from keras.callbacks import Callback


# create training sample data
Fs = 100  # sample rate
z = np.arange(100)
f = 1  # frequency in Hz
X_train = np.sin(2 * np.pi * f * z / Fs)
y_train = 0.1 * np.sin(2 * np.pi * f * z / Fs)


# create test sample data
f = 1  # frequency in Hz
X_test = np.sin(2 * np.pi * f * z / Fs) * 2
y_test = 0.2 * np.sin(2 * np.pi * f * z / Fs)


# convert data into LSTM compatible format
y_train = np.array(y_train)
y_test = np.array(y_test)
X_train = X_train.reshape(X_train.shape[0], 1, 1)
X_test = X_test.reshape(X_test.shape[0], 1, 1)


# build and compile model
model = models.Sequential()
model.add(layers.LSTM(1, batch_input_shape=(len(X_train), X_train.shape[1], X_train.shape[2]),
                      return_sequences=False, stateful=False))
model.add(layers.Dense(X_train.shape[1], input_shape=(1,), activation='linear'))
model.compile(optimizer=optimizers.Adam(lr=0.01, decay=0.008, amsgrad=True), loss='mean_squared_error', metrics=['mae'])


# construct a class for keras callbacks, to make sure the cell state is reset after each epoch
class ResetStatesAfterEachEpoch(Callback):
    def on_epoch_end(self, epoch, logs=None):
        self.model.reset_states()

reset_state = ResetStatesAfterEachEpoch()
callbacks = [reset_state]


# fit model to training data
history = model.fit(X_train, y_train, epochs=20000, batch_size=len(X_train),
                        shuffle=False, callbacks=callbacks)


# re-define LSTM model with weights of fit model to predict for 1 point, so also re-define the batch size to 1
new_batch_size = 1
new_model = models.Sequential()
new_model.add(layers.LSTM(1, batch_input_shape=(new_batch_size, X_test.shape[1], X_test.shape[2]), return_sequences=False,
                          stateful=False))
new_model.add(layers.Dense(X_test.shape[1], input_shape=(1,), activation='linear'))

# copy weights to new model
old_weights = model.get_weights()
new_model.set_weights(old_weights)


# single point prediction on train data
y_pred_train = new_model.predict(X_train, batch_size=new_batch_size)

# single point prediction on test data
y_pred_test = new_model.predict(X_test, batch_size=new_batch_size)

# plot predictions
plt.figure()
plt.plot(y_test, 'r', label='ground truth test',
         linestyle='dashed', linewidth=0.8)
plt.plot(y_train, 'b', label='ground truth train',
         linestyle='dashed', linewidth=0.8)
plt.plot(y_pred_test, 'g',
         label='y pred test', linestyle='dotted',
         linewidth=0.8)
plt.plot(y_pred_train, 'k',
         label='y pred train', linestyle='-.',
         linewidth=0.8)
plt.title('pred order: test, train')
plt.xlabel('time steps')
plt.ylabel('y')
plt.legend(prop={'size': 8})
plt.show()

score 1 · Accepted Answer

问题在这里：

model.add(layers.LSTM(1, batch_input_shape=(len(X_train), X_train.shape[1], X_train.shape[2]),
                      return_sequences=False, stateful=True))

你stateful=True在LSTM层中设置，这意味着每次预测后隐藏状态都不会重置，这就解释了你看到的效果。如果您不希望这种行为，您应该将其设置为默认值，stateful=False它将作为标准的无状态 LSTM 工作。

score 0 · Accepted Answer

所以我找到了一个解决方案，我不知道为什么会这样（如果有人这样做并且可以发表评论，我会很感激？），但它确实有效。

我添加了 X_train 的导数（这里是 cos），所以我得到了一个具有 2 个特征的多输入 LSTM。最终的 X_train 就像这段代码中假设的那样：

x = np.sin(2 * np.pi * f * z / Fs)
dx_dt = np.cos(2 * np.pi * f * z / Fs)
X_train = np.column_stack((x, dx_dt))

即使是一个时移的 y likey_train = 5 * np.sin(2 * np.pi * f * (z + 51) / Fs)也被预测得很好，训练了 3000 个 epoch。LSTM 1 层和 3 个神经元。

这是结果图。

keras - 为什么我会根据预测的顺序从 Keras LSTM 网络得到完全不同的预测？

2 回答 2

Related

Reference