我在训练集中使用标准化预测器来训练 LSTM 模型。在我预测了测试集中的结果之后,我需要将预测的分数反转回原来的规模。通常我可以使用预测分数 * 训练结果的 SD + 训练结果的 MEAN 来反转规模。然而,在做 LSTM 时,训练集中的每个特征都被测量了多次,标准化过程返回多个 MEAN 和 SD。我的问题是:
如何使用 python 代码将结果的比例反转为具有多个 MEAN 和 SD 的原始比例?
或者,我是否应该选择另一种对预测变量和结果进行归一化的方式,以便可以对结果进行反向缩放?您会推荐哪种标准化方法?
在像我这样的 3-D 数组中,如何为每个特征获得一个平均值和一个 SD?
非常感谢。
请参阅下面的可重现 python 代码和输出:
>>> import pandas as pd
>>> import numpy as np
>>> from keras.models import Sequential
>>> from keras.layers import LSTM
>>> from keras.layers import Embedding,LSTM,Dense
>>> from matplotlib import pyplot
>>> dat=pd.DataFrame(np.random.rand(2880*4).reshape(2880,4), columns = ['y','x1','x2','x3'])
>>> x=dat.iloc[:,[1,2,3]]
>>> y=dat.iloc[:,0]
>>> dat.head()
y x1 x2 x3
0 0.045795 0.974471 0.916503 0.208624
1 0.398229 0.628749 0.630672 0.672327
2 0.015625 0.164637 0.041553 0.057597
3 0.516001 0.377016 0.752409 0.040648
4 0.451607 0.074149 0.413406 0.245180
>>> dat.shape
(2880, 4)
>>> x=x.values
>>> y=y.values
>>> y=np.reshape(y,[y.shape[0],1])
>>> train_x=x[0:1440]
>>> train_y=y[0:1440]
>>> test_x=x[1440:2880]
>>> test_y=y[1440:2880]
>>>
>>> train_x=np.reshape(train_x,[-1,144,train_x.shape[1]])
>>> train_y=np.reshape(train_y,[-1,144,train_y.shape[1]])
>>> test_x=np.reshape(test_x,[-1, 144, test_x.shape[1]])
>>> test_y=np.reshape(test_y,[-1, 144, test_y.shape[1]])
>>> print(train_x.shape,train_y.shape,test_x.shape,test_y.shape)
(10, 144, 3) (10, 144, 1) (10, 144, 3) (10, 144, 1)
>>>
>>>
>>> means_x=np.mean(train_x,axis=0) # standardization
>>> means_y=np.mean(train_y,axis=0)
>>> stds_x=np.std(train_x,axis=0)
>>> stds_y=np.std(train_y,axis=0)
>>> s_train_x=(train_x-means_x)/stds_x
>>> s_test_x=(test_x-means_x)/stds_x
>>> s_train_y=(train_y-means_y)/stds_y
>>> s_test_y=(test_y-means_y)/stds_y
>>>
>>> model=Sequential()
>>> model.add(LSTM(10,input_shape=(s_train_x.shape[1],s_train_x.shape[2])))
>>> model.add(Dense(1))
>>> model.compile(loss='mae',optimizer='adam')
>>> history=model.fit(s_train_x,s_train_y,epochs=50,batch_size=100,validation_data=(s_test_x, s_test_y), shuffle=False)
>>>
>>> y_pred=model.predict(s_test_x)
>>>
>>> y_pred
array([[-0.23145649],
[-0.11043324],
[-0.10545453],
[ 0.13147753],
[-0.2414865 ],
[ 0.04826045],
[-0.35677138],
[-0.11905774],
[-0.01755336],
[ 0.16642463]], dtype=float32)
>>>