我正在尝试为我的数据框中的每个组计算 AUC、Precision、Recall、Accuracy(我有一个数据框,它连接了来自三个不同模型的预测数据)。
tidyverse 的语法是什么?我想使用 Max Kuhn 的yardstick包来计算这些指标。
这是一个示例 df ,这是我到目前为止的位置:
> library(tidyverse)
> library(yardstick)
>
> sample_df <- data_frame(
+ group_type = rep(c('a', 'b', 'c'), each = 5), # repeats each element 5 times
+ true_label = as.factor(rbinom(15, 1, 0.3)), # generates 1 with 30% prob
+ pred_prob = runif(15, 0, 1) # generates 15 decimals between 0 and 1 from uniform dist
+ ) %>%
+ mutate(pred_label = as.factor(if_else(pred_prob > 0.5, 1, 0)))
>
> sample_df
# A tibble: 15 x 4
group_type true_label pred_prob pred_label
<chr> <fct> <dbl> <fct>
1 a 1 0.327 0
2 a 1 0.286 0
3 a 0 0.0662 0
4 a 0 0.993 1
5 a 0 0.835 1
6 b 0 0.975 1
7 b 0 0.436 0
8 b 0 0.585 1
9 b 0 0.478 0
10 b 1 0.541 1
11 c 1 0.247 0
12 c 0 0.608 1
13 c 0 0.215 0
14 c 0 0.937 1
15 c 0 0.819 1
>
指标:
> # metrics for the full data
> precision(sample_df, truth = true_label, estimate = pred_label)
[1] 0.5714286
> recall(sample_df, truth = true_label, estimate = pred_label)
[1] 0.3636364
> accuracy(sample_df, truth = true_label, estimate = pred_label)
[1] 0.3333333
> roc_auc(sample_df, truth = true_label, pred_prob)
[1] 0.7727273
>
现在我如何为我的数据集中的每个组获取这些指标?
sample_df %>%
group_by(group_type) %>%
summarize(???)