感谢您提前回答。这是我的第一篇文章,我对 python 比较陌生,所以如果我格式化了一些可怕的东西,我深表歉意。
我试图在 sklearn 中结合递归特征消除和网格搜索来确定超参数和特征数量的最佳组合。使用下面的代码时,我得到max_features must be in (0, n_features] Estimator fit failed. for any other than max_features is 1. 我的数据集中有 300 多个特征,其中许多可能不重要。
'''
param_dist = {'estimator__n_estimators': [i for i in range(11, 121, 10)],
'estimator__criterion': ['gini', 'entropy']}
'estimator__max_features': [i for i in range(1, 10)]}
estimator = sklearn.ensemble.RandomForestClassifier(n_jobs=-1, random_state=42, bootstrap=True, verbose=True, max_features='auto')
selector = sklearn.feature_selection.RFECV(estimator=estimator, step=1, cv=5,
scoring='accuracy')
rf_nested = sklearn.model_selection.GridSearchCV(estimator=selector, param_grid=param_dist, cv=5,
scoring='accuracy', n_jobs=-1, refit=True, return_train_score=True)
rf_nested.fit(X_train, y_train)
'''