我有一个中间函数的配方,在包step_mutate()
支持的泰坦尼克数据集上执行文本数据转换。stringr
library(tidyverse)
library(tidymodels)
extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")
rf_recipe <-
recipe(Survived ~ ., data = titanic_train) %>%
step_impute_mode(Embarked) %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>%
step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>%
update_role(PassengerId, Name, new_role = "id")
这组转换与rf_recipe %>% prep() %>% bake(new_data = NULL)
.
当我尝试在工作流中使用超参数调整和 10 倍交叉验证来拟合随机森林模型时,所有模型都失败了。.notes 列的输出明确表示列存在问题mutate()
:Title
找不到函数str_remove()
。
doParallel::registerDoParallel()
rf_res <-
tune_grid(
rf_wf,
resamples = titanic_folds,
grid = rf_grid,
control = control_resamples(save_pred = TRUE)
)
正如这篇文章所暗示的那样,我已经明确告诉 R str_remove 应该在 stringr 包中找到。为什么这不起作用,可能是什么原因造成的?