python - scikit-learn 中回归交叉验证的递归特征消除

Question

我想使用 scikit-learn 对我的回归问题应用像递归特征消除这样的包装方法。使用交叉验证的递归特征消除很好地概述了如何自动调整特征数量。

我试过这个：

modelX = LogisticRegression()
rfecv = RFECV(estimator=modelX, step=1, scoring='mean_absolute_error')
rfecv.fit(df_normdf, y_train)
print("Optimal number of features : %d" % rfecv.n_features_)

# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()`

但我收到一条错误消息，例如

`The least populated class in y has only 1 members, which is too few. 
The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning)

警告听起来像是我有分类问题，但我的任务是回归问题。我该怎么做才能得到结果？出了什么问题？

score 1 · Accepted Answer

这是发生了什么：

默认情况下，当用户未指定折叠次数时，交叉验证RFE使用3-fold交叉验证。到目前为止，一切都很好。

但是，如果您查看文档，它还使用StartifiedKFoldwhich 确保通过保留每个类的样本百分比来创建折叠。因此，由于您的输出中的某些元素似乎（根据错误）y是唯一的，因此它们不能同时处于 3 个不同的折叠中。它抛出一个错误！

错误来自这里。

然后你需要使用未分层的 K-fold : KFold。

的文档RFECV说： "If the estimator is a classifier or if y is neither binary nor multiclass, sklearn.model_selection.KFold is used."

python - scikit-learn 中回归交叉验证的递归特征消除

1 回答 1

Related

Reference