我在数据集中运行随机森林分类器,作为 sklearn 管道的一个步骤。
# Numerical
numeric_cols = ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7']
numeric_transformer = Pipeline(
steps=[("scaler", StandardScaler())]
)
# Categorical
categ_cols = ['p8', 'p9', 'p10', 'p11', 'p12', 'p13']
categ_transformer = OneHotEncoder(handle_unknown="ignore")
# Preprocessing
preprocessor = ColumnTransformer(
transformers=[
("num", numeric_transformer, numeric_cols),
("cat", categ_transformer, categ_cols),
]
)
rf_pipe = Pipeline(
steps=[("preprocessor", preprocessor),
("feature_selection_var", VarianceThreshold()),
("feature_selection_percentile", SelectPercentile(f_classif, percentile=90)),
("classifier", (RandomForestClassifier(n_jobs=-1, class_weight='balanced',
criterion='entropy', max_features=10,
min_samples_leaf=50, n_estimators=1000)))]
)
cross_score = cross_val_score(rf_pipe, x_train_up, y_train_up, cv=10, scoring='roc_auc', n_jobs=-1)
print(f'cross_mean: {cross_score.mean()}, cross_std: {cross_score.std()}')
rf_pipe.fit(x_train_up, y_train_up)
我想绘制 RFC 属性feature_importances_
,但因为我的管道进行特征选择,我无法识别 fit 方法中使用的特征名称。所以我知道之后One Hot Encoder
,数组 X 包含 31 个特征。然后在SelectPercentile
数组 X 之后包含 RFC 中使用的 27 个特征。
我如何确定在 RFC 中选择和安装了哪些功能?当我访问 RFC 属性时,我只能获得有关功能重要性的数字,名称不可用。
rf_pipe.named_steps['classifier'].feature_importances_
out: array([8.41159321e-02, 1.23094971e-01, 1.62218154e-02, 3.34926745e-01,
1.06620128e-01, 1.37351967e-01, 9.39408084e-03, 1.74327442e-02,
1.62594558e-02, 1.66887184e-04, 1.66724711e-02, 7.06176017e-03,
6.81514535e-03, 1.11633257e-02, 1.32052716e-02, 3.72520454e-03,
3.64255314e-03, 1.25925324e-02, 1.12110261e-02, 9.37540757e-04,
7.53327441e-03, 7.30348346e-03, 1.40424287e-02, 2.04903820e-03,
1.73613154e-02, 9.33500153e-03, 9.76390164e-03])
rf_pipe.named_steps['classifier'].feature_names_in_
out:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
C:\Users\PHELIP~1.SOA\AppData\Local\Temp/ipykernel_10268/205801647.py in <module>
----> 1 rf_pipe.named_steps['classifier'].feature_names_in_
AttributeError: 'RandomForestClassifier' object has no attribute 'feature_names_in_'