目前我正在构建一个数据严重不平衡的分类器。我使用 imblearn 管道首先到 StandardScaling、SMOTE,然后使用 gridSearchCV 进行分类。这确保了在交叉验证期间完成上采样。现在我想将 feature_selection 包含到我的管道中。我应该如何将此步骤包含在管道中?
model = Pipeline([
('sampling', SMOTE()),
('classification', RandomForestClassifier())
])
param_grid = {
'classification__n_estimators': [10, 20, 50],
'classification__max_depth' : [2,3,5]
}
gridsearch_model = GridSearchCV(model, param_grid, cv = 4, scoring = make_scorer(recall_score))
gridsearch_model.fit(X_train, y_train)
predictions = gridsearch_model.predict(X_test)
print(classification_report(y_test, predictions))
print(confusion_matrix(y_test, predictions))