0

我是tidymodels的新手,也有点新R。我正在尝试从 Youtube tidytuesday/Sliced Customer churn data 复制 David Robinson 的代码,但在对交叉验证数据/重新采样应用配方更改时遇到问题

问题:当我对训练数据执行step_mutate()时,它可以工作,但是当我对交叉验证的数据应用相同的配方时,它会给出错误: train_5foldsError: All of the models failed. See the .notes column.

重新创建问题(使用以下代码下载数据):

train <- read.csv(url("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/train_object.csv"))

train_5fold交叉验证的重采样数据可从以下网址下载:https ://github.com/johnsnow09/covid19-df_stack-code/blob/main/train_5fold.RDS

train_5fold <- readRDS("train_5fold.RDS")

代码:

library(tidyverse)
library(tidymodels)
mset <- metric_set(mn_log_loss)

control <- control_grid(save_workflow = TRUE,
                        save_pred = TRUE,
                        extract = extract_model)
xg_spec <- parsnip::boost_tree(
    trees = tune(),
    mtry = tune(),
    learn_rate = tune()) %>% 
  set_engine("xgboost") %>%
  set_mode("classification")


factor_to_ordinal <- function(x){
  ifelse(x == "Unknown", NA, as.integer(x))
}


xg_rec_4 <- recipe(churned ~  .,data = train) %>% 
  
  update_role(id, new_role = "ID") %>%
  step_mutate(income_category = factor_to_ordinal(income_category),
              education_level = factor_to_ordinal(education_level)) %>% 
  step_impute_mean(all_numeric_predictors()) %>% 
  step_dummy(all_nominal_predictors())


xg_wf_4 <- workflow() %>% 
  add_recipe(xg_rec_4) %>% 
  add_model(xg_spec)


xg_res_4 <- xg_wf_4 %>%  
  tune_grid(
    resamples = train_5fold,
    metrics = mset,
    control = control,
    grid = crossing(trees = seq(200,800, 20),
                    mtry = c(2, 4, 6, 8, 10),
                    learn_rate = c(0.02))
    )
  )

autoplot(xg_res_4)

错误:所有模型均失败。请参阅 .notes 列。

.notes我得到

.notes
<chr>
preprocessor 1/1: Error: Problem with `mutate()` column `income_category`.\ni `income_category = factor_to_ordinal(income_category)`.\nx could not find function "factor_to_ordinal"

交叉检查:

xg_rec_4 %>% prep() %>% juice()

# A tibble: 5,316 x 15
      id customer_age education_level income_category total_relationship~ months_inactive_1~ credit_limit
   <dbl>        <dbl>           <int>           <int>               <dbl>              <dbl>        <dbl>
 1  9168           46               3               5                   3                  3         2171
 2  2187           51               4               4                   3                  1        11373
 3  5659           48               3               4                   4                  2        14322
 4   447           57               6               2                   5                  3        12291
 5  6342           39               4               5                   5                  2         1862
 6   496           56               6               5                   4                  3         3219
 7  7064           33               4               1                   6                  3        27499
 8  3978           48               4               4                   1                  2        34516
 9    13           41               4               5                   4                  3         2372
10  8242           46               3               2                   4                  3         3115
# ... with 5,306 more rows, and 8 more variables: total_revolving_bal <dbl>, total_amt_chng_q4_q1 <dbl>,
#   total_trans_amt <dbl>, total_trans_ct <dbl>, total_ct_chng_q4_q1 <dbl>, avg_utilization_ratio <dbl>,
#   churned <fct>, gender_M <dbl>
colSums(xg_rec_4 %>% prep() %>% juice() %>% select_if(is.numeric) %>% is.na())

                      id             customer_age          education_level          income_category 
                       0                        0                        0                        0 
total_relationship_count   months_inactive_12_mon             credit_limit      total_revolving_bal 
                       0                        0                        0                        0 
    total_amt_chng_q4_q1          total_trans_amt           total_trans_ct      total_ct_chng_q4_q1 
                       0                        0                        0                        0 
   avg_utilization_ratio                 gender_M 
                       0                        0 

在视频中它为大卫罗宾逊工作的地方:

在此处输入图像描述

4

0 回答 0