我recipe()
在包中使用函数tidymodels
来估算缺失值和修复不平衡数据。
这是我的数据;
mer_df <- mer2 %>%
filter(!is.na(laststagestatus2)) %>%
select(Id, Age_Range__c, Gender__c, numberoflead, leadduration, firsttouch, lasttouch, laststagestatus2)%>%
mutate_if(is.character, factor) %>%
mutate_if(is.logical, as.integer)
# A tibble: 197,836 x 8
Id Age_Range__c Gender__c numberoflead leadduration firsttouch lasttouch
<fct> <fct> <fct> <int> <dbl> <fct> <fct>
1 0010~ NA NA 2 5.99 Dealer IB~ Walk in
2 0010~ NA NA 1 0 Online Se~ Online S~
3 0010~ NA NA 1 0 Walk in Walk in
4 0010~ NA NA 1 0 Online Se~ Online S~
5 0010~ NA NA 2 0.0128 Dealer IB~ Dealer I~
6 0010~ NA NA 1 0 OB Call OB Call
7 0010~ NA NA 1 0 Dealer IB~ Dealer I~
8 0010~ NA NA 4 73.9 Dealer IB~ Walk in
9 0010~ NA Male 24 0.000208 OB Call OB Call
10 0010~ NA NA 18 0.000150 OB Call OB Call
# ... with 197,826 more rows, and 1 more variable: laststagestatus2 <fct>
这是我的代码;
mer_rec <- recipe(laststagestatus2 ~ ., data = mer_train)%>%
step_medianimpute(numberoflead,leadduration)%>%
step_knnimpute(Gender__c,Age_Range__c,fisrsttouch,lasttouch) %>%
step_other(Id,firsttouch) %>%
step_other(Id,lasttouch) %>%
step_dummy(all_nominal(), -laststagestatus2) %>%
step_smote(laststagestatus2)
mer_rec %>% prep() %>% juice()
glm_spec <- logistic_reg() %>%
set_engine("glm")
rf_spec <- rand_forest(trees = 1000) %>%
set_mode("classification") %>%
set_engine("ranger")
mer_wf <- workflow() %>%
add_recipe(mer_rec)
mer_metrics <- metric_set(roc_auc, accuracy, sensitivity, specificity)
直到这里它都可以正常工作现在我正在使用fit_resamples
函数来拟合每个重采样的逻辑回归。
这是我的代码如下:
doParallel::registerDoParallel()
glm_rs <- mer_wf %>%
add_model(glm_spec) %>%
fit_resamples(
resamples = mer_folds,
metrics = mer_metrics,
control = control_resamples(save_pred = TRUE)
glm_rs
我收到警告说:
Warning message:
All models failed in [fit_resamples()]. See the `.notes` column.
Resampling results
10-fold cross-validation using stratification
A tibble: 10 x 5
splits id .metrics .notes .predictions
<list> <chr> <list> <list> <list>
1 <split [133.5K/14.8K]> Fold01 <NULL> <tibble [1 x 1]> <NULL>
2 <split [133.5K/14.8K]> Fold02 <NULL> <tibble [1 x 1]> <NULL>
3 <split [133.5K/14.8K]> Fold03 <NULL> <tibble [1 x 1]> <NULL>
4 <split [133.5K/14.8K]> Fold04 <NULL> <tibble [1 x 1]> <NULL>
5 <split [133.5K/14.8K]> Fold05 <NULL> <tibble [1 x 1]> <NULL>
6 <split [133.5K/14.8K]> Fold06 <NULL> <tibble [1 x 1]> <NULL>
7 <split [133.5K/14.8K]> Fold07 <NULL> <tibble [1 x 1]> <NULL>
8 <split [133.5K/14.8K]> Fold08 <NULL> <tibble [1 x 1]> <NULL>
9 <split [133.5K/14.8K]> Fold09 <NULL> <tibble [1 x 1]> <NULL>
10 <split [133.5K/14.8K]> Fold10 <NULL> <tibble [1 x 1]> <NULL>
Warning message:
This tuning result has notes. Example notes on model fitting include:
recipe: Error: could not find function "all_nominal"
有人对如何做到这一点有任何建议吗?非常感谢您的帮助!