r - 使用 R，有没有办法用 F1 分数训练和交叉验证随机森林算法？

Question

我有类不平衡的数据（响应变量有两个类，其中一个类比另一个类明显更常见）。在这种情况下，准确率似乎不是训练模型的好指标（我可以获得 99% 的准确率并完全错误地分类少数类）。我认为使用 F1 分数会更有益。

有没有人尝试过使用 F1 分数作为 R 中的训练指标？我尝试修改 iris 数据集以将物种作为二元变量并运行随机森林。有人可以帮我调试吗？

library(caret)
library(randomForest) 

data(iris)

iris$Species = ifelse(iris$Species == "setosa", "a", "b") 

iris$Species = as.factor(iris$Species) 

f1 <- function (data, lev = NULL, model = NULL) {
                precision <- posPredValue(data$pred, data$obs, positive = "pass")
                recall <- sensitivity(data$pred, data$obs, postive = "pass") 
                f1_val <- (2 * precision * recall) / (precision + recall) 
                names(f1_val) <- c("F1")
                f1_val }


train.control <- trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 3, 
                              classProbs = TRUE,
                              #sampling = "smote", 
                              summaryFunction = f1,
                             search = "grid")
tune.grid <- expand.grid(.mtry = seq(from = 1, to = 10, by = 1)) 

random.forest.orig <- train(Species ~ ., data = iris,
                            method = "rf",
                            tuneGrid = tune.grid,
                            metric = "F1",
                            trControl = train.control)

给出以下错误：

Something is wrong; all the F1 metric values are missing:
       F1     
 Min.   : NA  
 1st Qu.: NA  
 Median : NA  
 Mean   :NaN  
 3rd Qu.: NA  
 Max.   : NA  
 NA's   :10   
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)
5: stop("Stopping", call. = FALSE)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(Species ~ ., data = iris, method = "rf", tuneGrid = tune.grid, 
       metric = "F1", trControl = train.control)
1: train(Species ~ ., data = iris, method = "rf", tuneGrid = tune.grid, 
       metric = "F1", trControl = train.control)
> warnings()
Warning messages:
1: In randomForest.default(x, y, mtry = param$mtry, ...) :
  invalid mtry: reset to within valid range

资料来源：使用 F1 度量的 Caret 训练模型

r - 使用 R，有没有办法用 F1 分数训练和交叉验证随机森林算法？

0 回答 0

Related

Reference