2

我在训练集中使用标准化预测器来训练 LSTM 模型。在我预测了测试集中的结果之后,我需要将预测的分数反转回原来的规模。通常我可以使用预测分数 * 训练结果的 SD + 训练结果的 MEAN 来反转规模。然而,在做 LSTM 时,训练集中的每个特征都被测量了多次,标准化过程返回多个 MEAN 和 SD。我的问题是:

  1. 如何使用 python 代码将结果的比例反转为具有多个 MEAN 和 SD 的原始比例?

  2. 或者,我是否应该选择另一种对预测变量和结果进行归一化的方式,以便可以对结果进行反向缩放?您会推荐哪种标准化方法?

  3. 在像我这样的 3-D 数组中,如何为每个特征获得一个平均值和一个 SD?

非常感谢。

请参阅下面的可重现 python 代码和输出:


>>> import pandas as pd
>>> import numpy as np
>>> from keras.models import Sequential
>>> from keras.layers import LSTM
>>> from keras.layers import Embedding,LSTM,Dense
>>> from matplotlib import pyplot

>>> dat=pd.DataFrame(np.random.rand(2880*4).reshape(2880,4), columns = ['y','x1','x2','x3'])
>>> x=dat.iloc[:,[1,2,3]]
>>> y=dat.iloc[:,0]
>>> dat.head()
          y        x1        x2        x3
0  0.045795  0.974471  0.916503  0.208624
1  0.398229  0.628749  0.630672  0.672327
2  0.015625  0.164637  0.041553  0.057597
3  0.516001  0.377016  0.752409  0.040648
4  0.451607  0.074149  0.413406  0.245180
>>> dat.shape
(2880, 4)

>>> x=x.values 
>>> y=y.values
>>> y=np.reshape(y,[y.shape[0],1])
>>> train_x=x[0:1440]
>>> train_y=y[0:1440]
>>> test_x=x[1440:2880]
>>> test_y=y[1440:2880]
>>> 
>>> train_x=np.reshape(train_x,[-1,144,train_x.shape[1]])
>>> train_y=np.reshape(train_y,[-1,144,train_y.shape[1]])
>>> test_x=np.reshape(test_x,[-1, 144, test_x.shape[1]])
>>> test_y=np.reshape(test_y,[-1, 144, test_y.shape[1]])
>>> print(train_x.shape,train_y.shape,test_x.shape,test_y.shape)
(10, 144, 3) (10, 144, 1) (10, 144, 3) (10, 144, 1)
>>> 
>>> 
>>> means_x=np.mean(train_x,axis=0) # standardization
>>> means_y=np.mean(train_y,axis=0)
>>> stds_x=np.std(train_x,axis=0)
>>> stds_y=np.std(train_y,axis=0)
>>> s_train_x=(train_x-means_x)/stds_x
>>> s_test_x=(test_x-means_x)/stds_x
>>> s_train_y=(train_y-means_y)/stds_y
>>> s_test_y=(test_y-means_y)/stds_y
>>> 
>>> model=Sequential()
>>> model.add(LSTM(10,input_shape=(s_train_x.shape[1],s_train_x.shape[2])))
>>> model.add(Dense(1))
>>> model.compile(loss='mae',optimizer='adam')
>>> history=model.fit(s_train_x,s_train_y,epochs=50,batch_size=100,validation_data=(s_test_x, s_test_y), shuffle=False)

>>> 
>>> y_pred=model.predict(s_test_x)
>>> 
>>> y_pred
array([[-0.23145649],
       [-0.11043324],
       [-0.10545453],
       [ 0.13147753],
       [-0.2414865 ],
       [ 0.04826045],
       [-0.35677138],
       [-0.11905774],
       [-0.01755336],
       [ 0.16642463]], dtype=float32)
>>> 
4

0 回答 0