2

I'm trying to use a function that calls on the pROC package in R to calculate the area under the curve for a number of different outcomes.

# Function used to compute area under the curve
proc_auc <- function(outcome_var, predictor_var) {
            pROC::auc(outcome_var, predictor_var)}

To do this, I am intending to refer to outcome names in a vector (much like below).

# Create a vector of outcome names 
outcome <- c('outcome_1', 'outcome_2')

However, I am having problems defining variables to input into this function. When I do this, I generate the error: "Error in roc.default(response, predictor, auc = TRUE, ...): 'response' must have two levels". However, I can't work out why, as I reckon I only have two levels...

I would be so happy if anyone could help me!

Here is a reproducible code from the iris dataset in R.

library(pROC)
library(datasets)
library(dplyr)

# Use iris dataset to generate binary variables needed for function
df <- iris %>% dplyr::mutate(outcome_1 = as.numeric(ntile(Sepal.Length, 4)==4), 
                 outcome_2 = as.numeric(ntile(Petal.Length, 4)==4))%>%
                 dplyr::rename(predictor_1 = Petal.Width)

# Inspect binary outcome variables 
df %>% group_by(outcome_1) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))
df %>% group_by(outcome_2) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))

# Function used to compute area under the curve
proc_auc <- function(outcome_var, predictor_var) {
            pROC::auc(outcome_var, predictor_var)}

# Create a vector of outcome names 
outcome <- c('outcome_1', 'outcome_2')

# Define variables to go into function
outcome_var <- df %>% dplyr::select(outcome[[1]])
predictor_var <- df %>% dplyr::select(predictor_1)


# Use function - first line works but not last line! 
proc_auc(df$outcome_1, df$predictor_1)
proc_auc(outcome_var, predictor_var)

4

2 回答 2

2

outcome_var并且predictor_var是具有一列的数据框,这意味着它们不能直接用作函数中的参数auc

只需指定列名,它就会起作用。

proc_auc(outcome_var$outcome_1, predictor_var$predictor_1)
于 2021-12-08T12:56:28.080 回答
1

您必须熟悉 dplyr 的非标准评估,这使得编程非常困难。特别是,您需要意识到传递变量名是一种间接方式,并且它有一种特殊的语法。

如果您想保留管道/非标准评估,您可以使用roc_遵循先前命名约定的函数,用于将变量名作为输入而不是实际列名的函数。

proc_auc2 <- function(data, outcome_var, predictor_var) {
    pROC::auc(pROC::roc_(data, outcome_var, predictor_var))
}

此时,您可以将实际的列名传递给这个新函数:

proc_auc2(df, outcome[[1]], "predictor_1")
# or equivalently:
df %>% proc_auc2(outcome[[1]], "predictor_1")

话虽如此,对于大多数用例,您可能希望遵循@druskacik 的回答并使用标准的 R 评估。

于 2021-12-08T14:54:59.287 回答