python - 在 CV 期间使用 GridSearchCV 在内部缩放训练数据以进行超参数优化

Question

我正在尝试使用 GridSearchCV 进行 SVM 超参数优化。假设我正在为这个函数提供训练集数据和标签（在函数调用之前已经隔离了测试拆分）。

def param_search(X, y):
    Cs = 10. ** np.arange(-3, 4)
    gammas = 10. ** np.arange(-3, 3)

    rbf_grid = {'clf__C':Cs, 'clf__gamma':gammas, 'clf__kernel':['rbf'],
        'clf__class_weight':['balanced']}
    lin_grid = {'clf__C':Cs, 'clf__kernel':['linear'], 
        'clf__class_weight':['balanced']}

    pipe = Pipeline([('scaler', StandardScaler()), ('clf', svm.SVC())])

    grid_search = GridSearchCV(pipe, param_grid=[rbf_grid, lin_grid],
        cv=StratifiedKFold(n_splits=5, shuffle=True), verbose=2, n_jobs=-1)
    grid_search.fit(X,y)
    return grid_search.best_params_

我希望 GridSearchCV 使用为该特定拆分计算的训练缩放的数据来评估每个 CV 拆分。StandardScaler() 函数当前是否为每组参数调用 5 次（这是我想要的）？或者在第一次调用 GridSearchCV 时只调用一次。

score 0 · Accepted Answer

简短回答：是的，它确实适合Standard Scaler每组参数的 5 次。基本上，整个管道在每次拆分和每个参数选择时安装一次，然后进行评估。

python - 在 CV 期间使用 GridSearchCV 在内部缩放训练数据以进行超参数优化

1 回答 1

Related

Reference