0

ranger用来适应随机森林。作为评估指标,我使用的是 roc-auc-score,按cvAUC. 做出预测后,当我尝试评估 auc 分数时,我得到一个错误:Format of predictions is invalid. It couldn't be coerced to a list. 我认为这是由于预测包含Level显示预测的独特级别的一部分。但是,我无法摆脱那部分。下面是最小的可重现示例,它会引发错误:

library(caret)
install.packages("cvAUC")
library(cvAUC)

# Columns for training set
cat.column <- c("cat", "dog", "monkey", "shark", "seal")
num.column <- c(1,2,5,7,9)
class <- c(0,1,0,0,1)

train.set <- data.frame(num.column, cat.column, class)

# Columns for test set
cat.column <- c("cat", "elephant-shrew", "monkey", "monkey", "seal")
num.column <- c(1,11,5,6,8)
class <- c(1,0,1,0,1)

test.set <- data.frame(num.column, cat.column, class)

# Drop the target variable from the test set
target.test <- test.set["class"]
test.set <- test.set[,!names(test.set) %in% "class"]

# Fit random forest
rf = ranger(formula = as.factor(class) ~ . , data = train.set, verbose = FALSE)
# Get predictions
pred <- predict(rf, test.set)
predictions <- pred$predictions

# Get AUC score
auc <- AUC(as.factor(predictions), as.factor(unlist(target.test)), label.ordering = NULL)

cat(auc)

4

1 回答 1

1

您收到错误是因为AUC期望数字向量不是一个因素。但是,在此示例中,在测试集中的列cat.column( elephant-shrew) 中出现了一个新级别。最好输入变量在训练和测试集中可以假设的所有可能值。

library(caret)
library(cvAUC)
library(ranger)
# Columns for training set
cat.column <- c("cat", "dog", "monkey", "shark", "seal")
num.column <- c(1,2,5,7,9)
class <- factor(c(0,1,0,0,1),levels = c(0,1))

train.set <- data.frame(num.column, cat.column, class,stringsAsFactors = F)
# Columns for test set
cat.column <- c("cat", "elephant-shrew", "monkey", "monkey", "seal")
num.column <- c(1,11,5,6,8)
class <- factor(c(1,0,1,0,1),,levels = c(0,1))

test.set <- data.frame(num.column, cat.column, class,stringsAsFactors = F)

# Drop the target variable from the test set
target.test <- test.set["class"]
test.set <- test.set[,!names(test.set) %in% "class"]

# Fit random forest
rf = ranger(formula = class ~ . , data = train.set, verbose = FALSE)
# Get predictions
pred <- predict(rf, test.set)
predictions <- pred$predictions

# Get AUC score
auc <- AUC(as.numeric(predictions), target.test$class, label.ordering = NULL)
cat(auc)

如您所见,我稍微更改了数据准备步骤。首先,如果您的class列是分类任务的结果,最好强制它尽快考虑因素。其次,如果测试集不包含字符变量的所有值(例如在您的示例中,列cat.column包含elephant-shrew未包含在训练集中的值),则最好将该变量作为字符处理(在此情况下,您可以使用stringAsFactor=F将字符变量保持为字符

于 2021-05-07T10:38:57.640 回答