0

我的 shap 值有问题,这是我的模型:

Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_5 (InputLayer)            [(None, 158)]        0                                            
__________________________________________________________________________________________________
model_1 (Model)                 (None, 158)          57310       input_5[0][0]                    
__________________________________________________________________________________________________
subtract_4 (Subtract)           (None, 158)          0           input_5[0][0]                    
                                                                 model_1[5][0]                    
__________________________________________________________________________________________________
multiply_4 (Multiply)           (None, 158)          0           subtract_4[0][0]                 
                                                                 subtract_4[0][0]                 
__________________________________________________________________________________________________
lambda_4 (Lambda)               (None,)              0           multiply_4[0][0]                 
__________________________________________________________________________________________________
reshape_3 (Reshape)             (None, 1)            0           lambda_4[0][0]                   
==================================================================================================
Total params: 57,310
Trainable params: 57,310
Non-trainable params: 0
__________________________________________________________________________________________________

我打电话给:

scores = new_model.predict(X_test_scaled)
scores = scores.reshape(scores.shape[0],1)
toexplain = np.append(X_test_scaled, scores, axis = 1)
toexplain = pd.DataFrame(toexplain)
toexplain.sort_values(by = [158], ascending=False, inplace=True)
toexplain = toexplain.iloc[0:16]
toexplain.drop(columns = [158], axis = 1, inplace = True)

explainer=shap.DeepExplainer(new_model, df_sampled_X_train_scaled)
shap_values = explainer.shap_values(toexplain, check_additivity=False)

但是我的 shap 值看起来像这样(对于第一个实例):

shap_values[0]

array([        nan,         nan,         nan,  0.08352888,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,  0.03286453,
               nan,         nan,  0.2984612 ,         nan,         nan,
               nan,  0.01110088, -0.85235232,         nan,         nan,
               nan,         nan,         nan,         nan, -0.27935541,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan, -0.18422949,  0.01466912,         nan,
               nan,         nan, -0.1688329 ,  0.07462809,  0.03071906,
               nan, -0.00554245,         nan,         nan,         nan,
               nan,  0.04587848,         nan,         nan,         nan,
               nan,  0.05448143,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,  0.00933742,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,  0.00919492,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan])

我相当确定我的 shap_values 中不应该有 nan 值,但我似乎找不到原始问题。此外,由 给出的预测值shap.force_plot与我的模型的预测不同,这就是我首先检查我的 shap_values 的原因。

有人知道我该如何解决吗?

4

1 回答 1

0

好的,通过阅读 shap 的源代码,我意识到它没有考虑到数据是 pandas 的数据帧,即使文档另有说明。

它使用 numpy.arrays

于 2020-04-07T06:26:34.873 回答