python-3.x - 为什么在 GridSearchCV 中使用 StandardScaler 时会得到不同的结果？

Question

我想通过 GridSearchCV 优化 SVM 的超参数。但是最佳估计器的得分与运行具有最佳参数的 svm 时的得分有很大不同。

#### Hyperparameter search with GridSearchCV###

pipeline = Pipeline([
        ("scaler", StandardScaler()), 
        ("svm", LinearSVC(loss='hinge'))])                      

param_grid=[{'svm__C': c_range}]      

clf = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring='accuracy')
clf.fit(X,y)          
print('\n Best score: ',clf.best_score_)


#### scale train and test data  ###

sc = StandardScaler()
sc.fit(X)
X = scaler.transform(X)
X_test = sc.transform(X_test)


###### test best estimator with test data ###################

print("Best estimator score: ", clf.best_estimator_.score(X_test, y_test))


##### run SVM with the best found parameter ##### 

svc = LinearSVC(C=clf.best_params_['svm_C'])
svc.fit(X,y)
print("score with best parameter: ", svc.score(X_test,y_test))

结果如下：

最好成绩：0.784

最佳估计分数：0.6991

最佳参数得分：0.7968

我不明白为什么最佳估计器和 svm 的分数不同？以下哪个结果是正确的测试精度？为什么 0.6991 的 Best estimator 的得分这么差？我做错了什么吗？

score 1 · Accepted Answer

在下面的行中：

print("Best estimator score: ", clf.best_estimator_.score(X_test, y_test))

您正在传递X_test已经缩放到clf哪个是pipeline包含另一个缩放器的 a，因此基本上您将数据缩放到最后一个预测语句的两倍，在该语句中，您将缩放数据传递给svc该数据，该数据只是在不缩放的情况下进行模型拟合。因此，两种情况下提供的数据完全不同，因此您的预测也不同。

希望这可以帮助！

python-3.x - 为什么在 GridSearchCV 中使用 StandardScaler 时会得到不同的结果？

1 回答 1

Related

Reference