1

我想计算曲线下面积(AUC)和交叉验证(cv)AUC的置信区间mlr3

了解到,对于回归任务,这可以通过predict_type = "se"

我想知道如何在 AUC/cvAUC 内做到这一点mlr3

在下面的更新中提出了 mlr3 之外的 cvAUC 解决方案)。

示例数据:

# library
library(mlr3verse)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# make task
all.task <- TaskClassif$new("all.data", data, target = "diabetes")

#make a learner 
learner <- lrn("classif.log_reg", predict_type = "prob")

# resample 
rr = resample(all.task, learner, rsmp("cv"))
#> INFO  [12:19:45.662] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 5/10) 
#> INFO  [12:19:45.741] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 8/10) 
#> INFO  [12:19:45.780] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 10/10) 
#> INFO  [12:19:45.805] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 2/10) 
#> INFO  [12:19:45.831] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 6/10) 
#> INFO  [12:19:45.859] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 1/10) 
#> INFO  [12:19:45.899] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 9/10) 
#> INFO  [12:19:45.926] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 7/10) 
#> INFO  [12:19:45.954] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 3/10) 
#> INFO  [12:19:45.995] [mlr3]  Applying learner 'classif.log_reg' on task 'all.data' (iter 4/10)

# get AUC
rr$aggregate(msr("classif.auc"))
#> classif.auc 
#>   0.8297186

reprex 包于 2021-04-02 创建(v1.0.0)

更新:

在外面mlr3我会用cvAUC包裹做

library(cvAUC)
library(tidyverse)

# extract predictions
rr$predictions() -> cv_pred_model

# prepare data for cv ci
cv_pred_model %>%
  map(.,as.data.table) %>% 
  map_df(~as.data.frame(.x), .id="fold") -> go

# calculate ci cv
ci.cvAUC(predictions=go$prob.1,labels=go$truth,folds=go$fold,confidence=0.95)
4

1 回答 1

2

目前,mlr3 没有内置的方法来计算 AUC 不确定性,这与自行计算测量值一样舒适(即没有$aggregate())。相反,您可以调用cvAUC::ci.cvAUC并为其提供所需的数据:

ResampleResult对象rr具有方法$predictions(),它为您提供每个重采样折叠的真实值以及预测分数。您可以使用data.table::rbindlist()带有idcolset 的函数来获取所有基本事实、所有预测和表示重采样折叠的指示符的表(为此,您必须将Prediction对象转换data.table为)。这些都是您需要的信息ci.cvAUC

print(rr$predictions())
#> [[1]]
#> <PredictionClassif> for 77 observations:
#>     row_ids truth response   prob.neg   prob.pos   
#>           2   neg      neg 0.94955791 0.05044209
#>           6   neg      neg 0.85101781 0.14898219
#>          13   neg      pos 0.22516526 0.77483474
#> ---
#>         744   pos      pos 0.33871290 0.66128710
#>         745   neg      pos 0.06836943 0.93163057
#>         755   pos      pos 0.27998597 0.72001403
#>
#> [[2]]
#> <PredictionClassif> for 77 observations:
#>     row_ids truth response  prob.neg  prob.pos
#>          18   pos      neg 0.8050657 0.1949343               
#> [....]

predictiontables <- lapply(rr$predictions(), data.table::as.data.table)
allpred <- data.table::rbindlist(predictiontables, idcol = "fold")
print(allpred)
#>      fold row_ids truth response  prob.neg   prob.pos
#>   1:    1       2   neg      neg 0.9495579 0.05044209
#>   2:    1       6   neg      neg 0.8510178 0.14898219
#>   3:    1      13   neg      pos 0.2251653 0.77483474
#>   4:    1      37   neg      pos 0.3366958 0.66330422
#>   5:    1      41   neg      pos 0.2578118 0.74218818
#>  ---
#> 764:   10     739   neg      neg 0.8232726 0.17672735
#> 765:   10     746   neg      neg 0.6842442 0.31575585
#> 766:   10     749   pos      pos 0.1735568 0.82644319
#> 767:   10     759   neg      neg 0.8184856 0.18151445
#> 768:   10     763   neg      neg 0.9075691 0.09243093

cvAUC::ci.cvAUC(predictions = allpred$prob.pos,
  labels = allpred$truth, folds = allpred$fold)
#> $cvAUC
#> [1] 0.8315585
#> 
#> $se
#> [1] 0.01511107
#> 
#> $ci
#> [1] 0.8019414 0.8611757
#> 
#> $confidence
#> [1] 0.95
#> 

如果您喜欢简洁magrittr的代码,则相当于

library("data.table")
library("magrittr")

rr$predictions() %>%
  lapply(as.data.table) %>%
  rbindlist(idcol = "fold") %$%
  cvAUC::ci.cvAUC(predictions = prob.pos, labels = truth, folds = fold)

请注意,由于随机方差,我得到的 AUC 值与 OP 不同。rr$aggregate()在这里同意 cvAUC:

rr$aggregate(msr("classif.auc"))
#> classif.auc
#>   0.8315585 
于 2021-04-07T17:06:09.357 回答