1

我正在使用医疗保险数据集来磨练我的建模技能,如下所示:

> insur_dt
      age    sex    bmi children smoker    region   charges
   1:  19 female 27.900        0    yes southwest 16884.924
   2:  18   male 33.770        1     no southeast  1725.552
   3:  28   male 33.000        3     no southeast  4449.462
   4:  33   male 22.705        0     no northwest 21984.471
   5:  32   male 28.880        0     no northwest  3866.855
  ---                                                      
1334:  50   male 30.970        3     no northwest 10600.548
1335:  18 female 31.920        0     no northeast  2205.981
1336:  18 female 36.850        0     no southeast  1629.833
1337:  21 female 25.800        0     no southwest  2007.945
1338:  61 female 29.070        0    yes northwest 29141.360

我使用recipes作为tidymodels元数据包的一部分来准备我的数据以在模型中使用,并且我已经确定了bmiage和 ,并smoker形成了一个交互项。

insur_split <- initial_split(insur_dt)

insur_train <- training(insur_split)
insur_test <- testing(insur_split)

# we are going to do data processing and feature engineering with recipes

# below, we are going to predict charges using everything else(".")
insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_zv(all_numeric()) %>%
    step_normalize(all_numeric()) %>%
    step_interact(~ bmi:smoker:age) %>% 
    prep()

根据tidymodels 指南/文档,我必须将交互指定为 as 中的一个recipe步骤step_interact。但是,当我尝试这样做时出现错误:

> insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
+     step_dummy(all_nominal()) %>%
+     step_zv(all_numeric()) %>%
+     step_normalize(all_numeric()) %>%
+     step_interact(~ bmi:smoker:age) %>% 
+     prep()
Interaction specification failed for: ~bmi:smoker:age. No interactions will be created.partial match of 'object' to 'objects'

我是建模新手,不太确定为什么会出现此错误。我只是试图charges说明所有其他预测变量都可以解释这一点,并且smoker(是/否因素)、age(数字)和bmi(双)都相互作用以告知结果。我究竟做错了什么?

4

1 回答 1

2

文档中:

step_interact可以创建变量之间的交互。它主要用于数字数据;step_dummy()在用于交互之前,分类变量可能应该使用转换为虚拟变量。

step_dummy(all_nominal())把变量smoker变成smoker_yes. 下面,您会看到我只是将smoker交互术语中的名称更改为smoker_yes

insur_rec <- recipe(charges ~ bmi + age + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_interact(terms = ~ bmi:age:smoker_yes) %>% 
    prep(verbose = TRUE, log_changes = TRUE)
于 2020-12-03T14:28:29.183 回答