0

我正在尝试训练大约 15 个机器学习模型,使用食谱(用于一致的预处理)和插入符号(用于一致的训练)。唯一两个始终给我错误“有问题;所有准确度指标值都丢失”的模型在partykit 包中——cforest 和 ctree。下面我使用来自 mlbench 的 PimaIndiansDiabetes 数据集显示错误。

my_rec <- recipe(diabetes ~ ., data = PimaIndiansDiabetes) %>%
  step_dummy(all_nominal(), -diabetes)%>%
  step_nzv(all_predictors())

fitControl5 <- trainControl(summaryFunction = twoClassSummary, 
                             verboseIter = TRUE, 
                             savePredictions =  TRUE, 
                             sampling = "smote", 
                             method = "repeatedcv", 
                             number= 5, 
                             repeats = 1,
                             classProbs = TRUE)

dtree5 <- train(my_rec, data = PimaIndiansDiabetes,
                 method = "cforest",
                 metric = "Accuracy",
                 tuneLength = 8,
                 trainControl = fitControl5)

note: only 7 unique complexity parameters in default grid. Truncating the grid to 7 .

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :7     NA's   :7    
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

下面是方法 ctree 的代码

dtree6 <- train(my_rec, data = PimaIndiansDiabetes,
                 method = "ctree",
                 metric = "Accuracy",
                 tuneLength = 8,
                 trainControl = fitControl5)
Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :8     NA's   :8    
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

我将衷心感谢您的帮助!

4

1 回答 1

0

参数应该是trControl =而不是trainControl =。如果我运行以下它可以工作:

dtree5 <- train(my_rec, data = PimaIndiansDiabetes,
                  method = "cforest",
                  metric = "Accuracy",
                  tuneLength = 3,
                  trControl = fitControl5)

输出:

dtree5
Conditional Inference Random Forest 

768 samples
  8 predictor
  2 classes: 'neg', 'pos' 

Recipe steps: dummy, nzv 
Resampling: Cross-Validated (5 fold, repeated 1 times) 
Summary of sample sizes: 614, 615, 614, 615, 614 
Addtional sampling using SMOTE

Resampling results across tuning parameters:

  mtry  ROC        Sens   Spec     
  2     0.8298281  0.788  0.7013277
  5     0.8256038  0.794  0.7013277
  8     0.8222572  0.798  0.7276031

ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.
于 2021-01-04T03:32:30.253 回答