我想做的事
我正在尝试建立一个模型tidymodels
来预测药物对细胞系(如细菌)的功效。该模型将根据给定细胞系的功效对药物进行排名,因此我想使用 Spearman 相关性 (ρ) 作为指标。在下面的示例数据集中,每个细胞系(列Sample
)由一个字母 表示,Q, R, S, ..., Z
每个样本用 50 种药物处理。
当我拆分数据进行交叉验证时,每个折叠的训练/测试拆分将具有 >1 个细胞系(例如Q, R
,在折叠 1 的测试拆分中),但在计算度量(ρ)时,我想计算它每个细胞系单独,然后对测试拆分中的所有细胞系取平均值,而不是对所有观察结果进行聚合。例如,如果折叠 1 的测试拆分由 组成Q, R
,那么我想计算 50 种测试药物的 ρ,然后计算 50 种药物测试Q
的单独 ρ R
,平均这两个 ρ,并将该平均值作为计算的度量折叠 1。
我试过的
我在想我必须计算按Sample
列分组的 tibble/data.frame 上的指标,但我不知道如何将该变量传递到tune_grid()
. 我不认为我可以在add_formula()
创建工作流对象时包含该变量,因为我不希望它作为预测变量。我昨天刚刚发现了 tidymodels,所以也许有一个我不知道的简单解决方案,但到目前为止我还没有在谷歌上找到任何东西。下面的代码是我尝试过的,但显然它不起作用。提前感谢您提供的任何建议。
错误
i Resample1: preprocessor 1/1
✓ Resample1: preprocessor 1/1
i Resample1: preprocessor 1/1, model 1/20
✓ Resample1: preprocessor 1/1, model 1/20
i Resample1: preprocessor 1/1, model 1/20 (predictions)
x Resample1: internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm ...
i Resample2: preprocessor 1/1
✓ Resample2: preprocessor 1/1
i Resample2: preprocessor 1/1, model 1/20
✓ Resample2: preprocessor 1/1, model 1/20
i Resample2: preprocessor 1/1, model 1/20 (predictions)
x Resample2: internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm ...
i Resample3: preprocessor 1/1
✓ Resample3: preprocessor 1/1
i Resample3: preprocessor 1/1, model 1/20
✓ Resample3: preprocessor 1/1, model 1/20
i Resample3: preprocessor 1/1, model 1/20 (predictions)
x Resample3: internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm ...
i Resample4: preprocessor 1/1
✓ Resample4: preprocessor 1/1
i Resample4: preprocessor 1/1, model 1/20
✓ Resample4: preprocessor 1/1, model 1/20
i Resample4: preprocessor 1/1, model 1/20 (predictions)
x Resample4: internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm ...
i Resample5: preprocessor 1/1
✓ Resample5: preprocessor 1/1
i Resample5: preprocessor 1/1, model 1/20
✓ Resample5: preprocessor 1/1, model 1/20
i Resample5: preprocessor 1/1, model 1/20 (predictions)
x Resample5: internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm ...
Warning message:
All models failed. See the `.notes` column.
运行时glmnet_tuning_results
:
Warning message:
This tuning result has notes. Example notes on model fitting include:
internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm = ~na_rm)
internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm = ~na_rm)
internal: Error: In metric: `spearman_cor`
unused arguments (truth = ~TargetVariable, estimate = ~.pred, na_rm = ~na_rm)
代码
示例数据集
data = tibble(
Sample = rep(LETTERS[17:26], each = 50),
TargetVariable = rnorm(500, mean = 0, sd = 1),
PredictorVariable1 = rnorm(500, mean = 5, sd = 1),
PredictorVariable2 = rpois(500, lambda = 5)
)
模型
# Splitting for cross-validation.
set.seed(1026)
folds = group_vfold_cv(data, group = Sample, v = 5)
# Model specification.
glmnet_model = linear_reg(
mode = "regression",
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet")
# Workflow.
glmnet_wf = workflow() %>%
add_model(glmnet_model) %>%
add_formula(TargetVariable ~ . - Sample)
# Grid specification.
glmnet_params = parameters(penalty(), mixture())
set.seed(1026)
glmnet_grid = grid_max_entropy(glmnet_params, size = 20)
# Hyperparameter tuning.
glmnet_tuning_results = tune_grid(
glmnet_wf,
resamples = folds,
grid = glmnet_grid,
metrics = metric_set(spearman_cor),
control = control_grid(verbose = TRUE)
)
glmnet_tuning_results %>% show_best(n = 10)
自定义指标
# Vector version.
spearman_cor_vec = function(truth, estimate, na_rm = TRUE) {
spearman_cor_impl = function(truth, estimate) {
cor(truth, estimate, method = "spearman")
}
metric_vec_template(
metric_impl = spearman_cor_impl,
truth = truth,
estimate = estimate,
na_rm = na_rm,
cls = "numeric"
)
}
# Data frame version.
spearman_cor = function(data) {
UseMethod("spearman_cor")
}
spearman_cor = new_numeric_metric(spearman_cor, direction = "maximize")
spearman_cor.data.frame = function(data, truth, estimate, na_rm = TRUE) {
data_grouped = data %>%
group_by(Sample)
metric_summarizer(
metric_nm = "spearman_cor",
metric_fn = spearman_cor_vec,
data = data_grouped,
truth = !! enquo(truth),
estimate = !! enquo(estimate),
na_rm = na_rm
)
}
会话信息
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.3 (2020-02-29)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Chicago
#> date 2021-08-25
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.2)
#> cli 3.0.1 2021-07-17 [1] CRAN (R 3.6.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 3.6.2)
#> knitr 1.27 2020-01-16 [1] CRAN (R 3.6.0)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 3.6.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 3.6.2)
#> pillar 1.6.2 2021-07-29 [1] CRAN (R 3.6.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 3.6.2)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> styler 1.5.1 2021-07-13 [1] CRAN (R 3.6.2)
#> tibble 3.1.3 2021-07-23 [1] CRAN (R 3.6.2)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 3.6.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 3.6.2)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library