我有一个循环可以找到运行模型的最终参数,但是我无法获取找到并用于运行数据的参数。我正在寻找对报告最佳参数的代码的调整。此代码在 1500 行和 200 列的数字数据集上运行大约需要 20 分钟。
这是我所拥有的能够产生最终结果的东西。
def test(models, data, iterations = 100):
results = {}
for i in models:
r2_train = []
r2_test = []
for j in range(iterations):
X_train, X_test, y_train, y_test = train_test_split(data[X],data[Y], test_size= 0.2)
r2_test.append(metrics.r2_score(y_test,models[i].fit(X_train, y_train).predict(X_test)))
r2_train.append(metrics.r2_score(y_train, models[i].fit(X_train, y_train).predict(X_train)))
results[i] = [np.mean(r2_train), np.mean(r2_test)]
return pd.DataFrame(results)
lasso_params = {'fit__alpha':[0.005, 0.02, 0.03, 0.05, 0.06]}
ridge_params = {'fit__alpha':[550, 580, 600, 620, 650]}
pipe1 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.LinearRegression())])
pipe2 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.Lasso())])
pipe3 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.Ridge())])
models3 = {'OLS': pipe1,
'Lasso': GridSearchCV(pipe2, param_grid=lasso_params).fit(train[X],train[Y]).best_estimator_ ,
'Ridge': GridSearchCV(pipe3, param_grid=ridge_params).fit(train[X],train[Y]).best_estimator_,}
这就是我所说的:
test(models3, train)
这是生成的:
OLS Lasso Ridge
0 1.000000 0.914186 0.985494
1 0.700401 0.877555 0.867068
我还想获得喜欢生成结果的参数。提前感谢您的任何帮助和澄清。