scikit-learn - XGBoost 绘图重要性 F 值 >100

Question

我已经为我的模型中的所有特征绘制了 XGBoost 特征重要性，如下图所示。但是您可以看到图中的 F Score 值未标准化（不在 0 到 100 范围内）。如果您知道为什么会这样，请告诉我。我是否需要在 plot_importance 函数中传递任何参数以进行标准化？

score 2 · Accepted Answer

绘图的特征重要性plot_importance由其参数确定 importance_type，默认为weight。有 3 个选项weight：gain和cover。但是，它们都不是百分比。

从documentation这个方法：

重要性类型（str，默认“权重”）–如何计算重要性：“权重”、“增益”或“覆盖”

“权重”是特征在树中出现的次数

“增益”是使用该特征的分割的平均增益

“cover”是分割的平均覆盖率，它使用覆盖率定义为受分割影响的样本数

所以，长话短说：你想要的没有简单的解决方案。

解决方法

模型的属性feature_importances_按照你的意愿进行归一化，你可以自己绘制，但它会是一个手工制作的图表。

首先，确保importance_type将 Classifier 的参数设置为上面列举的选项之一（构造函数的默认值为，因此如果不更改它gain，您将看到与绘制的内容有差异）。plot_importances

best_model = xgb.XGBClassifier(importance_type='weight')

之后，您可以在这一行中尝试一些东西：

import pandas as pd

best_model.feature_importances_
# In my toy example: array([0.21473685, 0.19157895, 0.28842106, 0.30526316], dtype=float32)

best_model.feature_importances_.sum()
#  1.0

# Build a simple dataframe with the feature importances
# You can change the naming fN to something more human readable
fs = len(best_model.feature_importances_)
df = pd.DataFrame(zip([f"f{n}" for n in range(fs)], best_model.feature_importances_), columns=['Features', 'Feature Importance'])
df = df.set_index('Features').sort_values('Feature Importance')

# Build horizontal bar char
ax = df.plot.barh(color='red', alpha=0.5, grid=True, legend=False, title='Feature importance', figsize=(15, 5))

# Annotate bar chart, adapted from this SO answer:
# https://stackoverflow.com/questions/25447700/annotate-bars-with-values-on-pandas-bar-plots
for p, value in zip(ax.patches, df['Feature Importance']):
    ax.annotate(round(value, 2), (p.get_width() * 1.005, p.get_y() * 1.005))

通过这种方法，我得到如下图表，它与原始图表足够接近：

scikit-learn - XGBoost 绘图重要性 F 值 >100

1 回答 1

解决方法

Related

Reference