处理少量数据,并使用折叠处理过度拟合[GridSearchCV]
我完全不知道如何从我的模型中获得更好的估计。似乎当我尝试运行我的代码时,我得到了负面的准确性。我怎样才能提高 cross_val_score 或测试分数或任何你想称之为的东西,以便我可以更可靠地预测值。
我尝试添加更多数据(从 50 到 200+)。
我尝试了随机参数(并意识到这是一种天真的方法)
我还尝试使用 StandardScaler 在功能上清理我的数据
有人有什么建议吗?
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
import requests
import json
from calendar import monthrange
import numpy as np
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.preprocessing import scale
r =requests.get('https://www.alphavantage.co/query?function=TIME_SERIES_WEEKLY_ADJUSTED&symbol=W&apikey=QYQ2D6URDOKNUGF4')
#print(r.text)
y = json.loads(r.text)
#print(y["Monthly Adjusted Time Series"].keys())
keysInResultSet = y["Weekly Adjusted Time Series"].keys()
#print(keysInResultSet)
featuresListTemp = []
labelsListTemp = []
count = 0;
for i in keysInResultSet:
#print(i)
count = count + 1;
#print(y["Monthly Adjusted Time Series"][i])
tmpList = []
tmpList.append(count)
featuresListTemp.append(tmpList)
strValue = y["Weekly Adjusted Time Series"][i]["5. adjusted close"]
numValue = float(strValue)
labelsListTemp.append(numValue)
print("TOTAL SET")
print(featuresListTemp)
print(labelsListTemp)
print("---")
arrTestInput = []
arrTestOutput = []
print("SCALING SET")
X_train = np.array(featuresListTemp)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
print(X_train_scaled)
product_model = MLPRegressor()
#10.0 ** -np.arange(1, 10)
#todo : once found general settings, iterate through some more seeds to find one that can be used on the training
parameters = {'learning_rate': ['constant','adaptive'],'solver': ['lbfgs','adam'], 'tol' : 10.0 ** -np.arange(1, 4), 'verbose' : [True], 'early_stopping': [True], 'activation' : ['tanh','logistic'], 'learning_rate_init': 10.0 ** -np.arange(1, 4), 'max_iter': [4000], 'alpha': 10.0 ** -np.arange(1, 4), 'hidden_layer_sizes':np.arange(1,11), 'random_state':np.arange(1, 3)}
clf = GridSearchCV(product_model, parameters, n_jobs=-1)
clf.fit(X_train_scaled, labelsListTemp)
print(clf.score(X_train_scaled, labelsListTemp))
print(clf.best_params_)
best_params = clf.best_params_
newPM = MLPRegressor(hidden_layer_sizes=((best_params['hidden_layer_sizes'])), #try reducing the layer size / increasing it and playing around with resultFit variable
batch_size='auto',
power_t=0.5,
activation=best_params['activation'],
solver=best_params['solver'], #non scaled input
learning_rate=best_params['learning_rate'],
max_iter=best_params['max_iter'],
learning_rate_init=best_params['learning_rate_init'],
alpha=best_params['alpha'],
random_state=best_params['random_state'],
early_stopping=best_params['early_stopping'],
tol=best_params['tol'])
scores = cross_val_score(newPM, X_train_scaled, labelsListTemp, cv=10, scoring='neg_mean_absolute_error')
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print(scores)
第 63 行及以下的输出
0.9142644531564619 {'activation': 'logistic', 'alpha': 0.001, 'early_stopping': True, 'hidden_layer_sizes': 7, 'learning_rate': 'constant', 'learning_rate_init': 0.1, 'max_iter': 4000, 'random_state ':2,'求解器':'lbfgs','tol':0.01,'详细':真}
精度:-21.91 (+/- 58.89) [ -32.87854574 -105.0632913
-22.89836453 -7.33154414 -22.38773819 -3.3786339 -1.7658796 -3.78002866 -4.78727388 -14.8]