python - 如何在拟合我的估计器之前使用 sklearn RFECV 选择最佳特征以传递给降维步骤

Question

在使用 KNN 拟合我的估计器之前，如何使用 sklearn RFECV 方法选择最佳特征以传递给 LinearDiscriminantAnalysis(n_components=2) 方法进行降维。

pipeline = make_pipeline(Normalizer(), LinearDiscriminantAnalysis(n_components=2), KNeighborsClassifier(n_neighbors=10))

X = self.dataset
y = self.postures

min_features_to_select = 1  # Minimum number of features to consider
rfecv = RFECV(svc, step=1, cv=None, scoring='f1_weighted', min_features_to_select=min_features_to_select)

rfecv.fit(X, y)

print(rfecv.support_)
print(rfecv.ranking_)
print("Optimal number of features : %d" % rfecv.n_features_)

Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(min_features_to_select,
len(rfecv.grid_scores_) + min_features_to_select),
rfecv.grid_scores_)
plt.show()

我从此代码中收到以下错误。如果我在没有 LinearDiscriminantAnalysis() 步骤的情况下运行此代码，那么它可以工作，但这是我处理的重要部分。

*** ValueError: when `importance_getter=='auto'`, the underlying estimator Pipeline should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.

score 0 · Accepted Answer

您的方法有一个整体问题：KNeighborsClassifier没有对特征重要性的内在度量。RFECV因此，它与有关分类器的文档所述不兼容：

具有拟合方法的监督学习估计器，通过 coef_ 属性或通过 feature_importances_ 属性提供有关特征重要性的信息。

你肯定会失败KNeighborsClassifier。您肯定需要另一个分类器，例如RandomForestClassifieror SVC。

如果您可以选择另一个分类器，您的管道仍然需要公开估计器在您的管道中的特征重要性。为此，您可以在此处参考此答案，该答案为此目的定义了一个自定义管道：

class Mypipeline(Pipeline):
    @property
    def coef_(self):
        return self._final_estimator.coef_
    @property
    def feature_importances_(self):
        return self._final_estimator.feature_importances_

定义您的管道，如：

pipeline = MyPipeline([
    ('normalizer', Normalizer()),
    ('ldm', LinearDiscriminantAnalysis(n_components=2)),
    ('rf', RandomForestClassifier())
])

它应该可以工作。

python - 如何在拟合我的估计器之前使用 sklearn RFECV 选择最佳特征以传递给降维步骤

1 回答 1

Related

Reference