0

我有一个循环可以找到运行模型的最终参数,但是我无法获取找到并用于运行数据的参数。我正在寻找对报告最佳参数的代码的调整。此代码在 1500 行和 200 列的数字数据集上运行大约需要 20 分钟。

这是我所拥有的能够产生最终结果的东西。

def test(models, data, iterations = 100):
    results = {}
    for i in models:
        r2_train = []
        r2_test = []
        for j in range(iterations):
            X_train, X_test, y_train, y_test = train_test_split(data[X],data[Y], test_size= 0.2)
            r2_test.append(metrics.r2_score(y_test,models[i].fit(X_train, y_train).predict(X_test)))
            r2_train.append(metrics.r2_score(y_train, models[i].fit(X_train,  y_train).predict(X_train)))
        results[i] = [np.mean(r2_train), np.mean(r2_test)]
    return pd.DataFrame(results)

lasso_params = {'fit__alpha':[0.005, 0.02, 0.03, 0.05, 0.06]}
ridge_params = {'fit__alpha':[550, 580, 600, 620, 650]}

pipe1 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.LinearRegression())])
pipe2 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.Lasso())])
pipe3 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.Ridge())])

models3 = {'OLS': pipe1, 
'Lasso': GridSearchCV(pipe2, param_grid=lasso_params).fit(train[X],train[Y]).best_estimator_ , 
'Ridge': GridSearchCV(pipe3, param_grid=ridge_params).fit(train[X],train[Y]).best_estimator_,}

这就是我所说的:

test(models3, train)

这是生成的:

       OLS       Lasso       Ridge
0   1.000000    0.914186    0.985494
1   0.700401    0.877555    0.867068

我还想获得喜欢生成结果的参数。提前感谢您的任何帮助和澄清。

4

0 回答 0