python - 如何告诉 shap 树解释器和 shap 值计算器哪些变量是分类的？

Question

我需要更好地理解我的 LightGBM 模型，所以我正在使用 SHAP Tree 解释器。需要对lightgbm数据进行编码，我将相同的数据传递给树解释器。因此，我担心 SHAPTreeExplainer并将shap_values()我的数据视为数字数据。如何指定数据是分类的？这会改变 SHAP 值的计算吗？

我已经浏览了文档。

score 6 · Accepted Answer

shap无法处理 type 的特征object。只需确保您的连续变量是 typefloat并且您的分类变量是 type category。

for cont in continuous_variables:
    df[cont] = df[cont].astype('float64')

for cat in categorical_variables:
    df[cat] = df[cat].astype('category')

最后，您还需要确保在参数中提供相应的值：

params = {
    'objective': "binary", 
    'num_leaves': 100, 
    'num_trees': 500, 
    'learning_rate': 0.1, 
    'tree_learner': 'data', 
    'device': 'cpu', 
    'seed': 132, 
    'max_depth': -1, 
    'min_data_in_leaf': 50, 
    'subsample': 0.9, 
    'feature_fraction': 1, 
    'metric': 'binary_logloss', 
    'categorical_feature': ['categoricalFeature1', 'categoricalFeature2']
}

bst = lgbm.Booster(model_file='model_file.txt')
tree_explainer = shap.TreeExplainer(bst)
tree_explainer.model.original_model.params = params

shap_values_result = tree_explainer.shap_values(df[features], y=df[target])

或者，您可以选择在分类特征上应用标签编码。例如，

df['categoricalFeature'] = df['categoricalFeature'].astype('category')
df['categoricalFeature'] = df['categoricalFeature'].cat.codes

请注意，请确保您可以重现此映射，以便您也可以以相同的方式转换验证/测试数据集。

python - 如何告诉 shap 树解释器和 shap 值计算器哪些变量是分类的？

1 回答 1

Related

Reference