正如您从我的代码中看到的那样,我正在尝试将特征选择包含在我的 tidymodels 工作流程中。我正在使用一些 kaggle 数据,试图预测客户流失。
为了将处理应用于测试和训练数据,我在使用 prep() 函数后烘焙配方。
但是,如果我想对 step_select_roc() 函数 top_p 参数进行调整,我不知道之后如何 prep() 配方。像在我的代表中一样应用它会导致错误。
也许我必须调整我的工作流程并分离一些配方任务才能完成工作。实现这一目标的最佳方法是什么?
#### LIBS
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(themis))
suppressPackageStartupMessages(library(recipeselectors))
#### INPUT
# get dataset from: https://www.kaggle.com/shrutimechlearn/churn-modelling
data <- fread("Churn_Modelling.csv")
# split data
set.seed(seed = 1972)
train_test_split <-
rsample::initial_split(
data = data,
prop = 0.80
)
train_tbl <- train_test_split %>% training()
test_tbl <- train_test_split %>% testing()
#### FEATURE ENGINEERING
# Define the recipe
recipe <- recipe(Exited ~ ., data = train_tbl) %>%
step_rm(one_of("RowNumber", "Surname")) %>%
update_role(CustomerId, new_role = "Helper") %>%
step_num2factor(all_outcomes(),
levels = c("No", "Yes"),
transform = function(x) {x + 1}) %>%
step_normalize(all_numeric(), -has_role(match = "Helper")) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_corr(all_numeric(), -has_role("Helper")) %>%
step_nzv(all_predictors()) %>%
step_select_roc(all_predictors(), outcome = "Exited", top_p = tune()) %>%
prep()
# Bake it
train_baked <- recipe %>% bake(train_tbl)
test_baked <- recipe %>% bake(test_tbl)