r - 为什么DALEX和tidymodels提供不同的GOF？

Question

我想知道为什么 DALEXmodel_performance并collect_metrics没有提供同样的精度。他们是否使用不同的措施或不同的方法？我编译了以下示例代码：

library(tidymodels)
library(parsnip)
library(DALEXtra)

set.seed(1)
x1 <- rbinom(1000, 5, .1)
x2 <- rbinom(1000, 5, .4)
x3 <- rbinom(1000, 5, .9)
x4 <- rbinom(1000, 5, .6)
id <- c(1:1000)
y <- as.factor(rbinom(1000, 5, .5))
df <- tibble(y, x1, x2, x3, x4, id)


# create training and test set
set.seed(20)
split_dat <- initial_split(df, prop = 0.8)
train <- training(split_dat)
test <- testing(split_dat)
# use cross-validation
kfolds <- vfold_cv(df)

# recipe
rec_pca <- recipe(y ~ ., data = train) %>%
  update_role(id, new_role = "id variable") %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) %>%
  step_pca(x1, x2, x3, threshold = 0.9, num_comp = 1)

# parsnip engine
boost_model <- boost_tree() %>% 
  set_mode("classification") %>% 
  set_engine("xgboost")

# create wf
boosted_wf <- 
  workflow() %>% 
  add_model(boost_model) %>% 
  add_recipe(rec_pca)

boosted_res <- last_fit(boosted_wf, split_dat)
collect_metrics(boosted_res)

的输出collect_metrics为 0.31

# A tibble: 2 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy multiclass     0.31  Preprocessor1_Model1
2 roc_auc  hand_till      0.512 Preprocessor1_Model1

继续准备DALEX模型说明。

final_boosted <- generics::fit(boosted_wf, df) 

# create an explanation object
explainer_xgb <- DALEXtra::explain_tidymodels(final_boosted, 
                                              data = df[,-1], 
                                              y = df$y) 

perf <- model_performance(explainer_xgb)
perf

现在，这为整体拟合提供了以下输出：

Measures for:  multiclass
micro_F1   : 0.43 
macro_F1   : 0.5743392 
w_macro_F1 : 0.4775901 
accuracy   : 0.43 
w_macro_auc: 0.7064296

请注意，准确度是 0.43 usingmodel_performance和 0.31 using collect_metrics。有谁知道为什么会这样？

score 1 · Accepted Answer

我相信这是因为正在使用不同的重采样指标/方案。换句话说，正在使用不同的数据来计算性能统计信息。

r - 为什么DALEX和tidymodels提供不同的GOF？

1 回答 1

Related

Reference