我正在尝试调用 GridSearchCV 以获得最佳估计器,如果我调用这样的参数
clf = DecisionTreeClassifier(random_state=42)
parameters = {'max_depth':[2,3,4,5,6,7,8,9,10],\
'min_samples_leaf':[2,3,4,5,6,7,8,9,10],\
'min_samples_split':[2,3,4,5,6,7,8,9,10]}
scorer = make_scorer(f1_score)
grid_obj = GridSearchCV(clf, parameters, scoring=scorer)
grid_fit = grid_obj.fit(X_train, y_train)
best_clf = grid_fit.best_estimator_
best_clf.fit(X_train, y_train)
best_train_predictions = best_clf.predict(X_train)
best_test_predictions = best_clf.predict(X_test)
print('The training F1 Score is', f1_score(best_train_predictions, y_train))
print('The testing F1 Score is', f1_score(best_test_predictions,
y_test))
结果将是
The training F1 Score is 0.784810126582
The testing F1 Score is 0.72
对于相同的数据,结果会有所不同我只将 [2,3,4,5,6,7,8,9,10] 更改为 [2,4,6,8,10]
clf = DecisionTreeClassifier(random_state=42)
parameters = {'max_depth':[2,4,6,8,10],'min_samples_leaf':[2,4,6,8,10],\
'min_samples_split':[2,4,6,8,10] }
scorer = make_scorer(f1_score)
grid_obj = GridSearchCV(clf, parameters, scoring=scorer)
grid_fit = grid_obj.fit(X_train, y_train)
best_clf = grid_fit.best_estimator_
best_clf.fit(X_train, y_train)
best_train_predictions = best_clf.predict(X_train)
best_test_predictions = best_clf.predict(X_test)
print('The training F1 Score is', f1_score(best_train_predictions, y_train))
print('The testing F1 Score is', f1_score(best_test_predictions, y_test))
结果
The training F1 Score is 0.814814814815
The testing F1 Score is 0.8
对 GridsearchCV 的工作原理感到困惑