0

我使用该函数创建的最后一个类别step_num2factor()可以正确创建所有级别,但最后一个。它在那里填写了一个 NA。

MWE

test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))

打印时看起来像这样:

# A tibble: 7 x 2
   pred target
  <dbl>  <dbl>
1     0      0
2     1      1
3     2      0
4     3      1
5     4      1
6     5      1
7     8      0

执行配方步骤并比较结果

test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))

my_levels <- c("zero", "one", "two", "three", "four", "five", "eight")

recipe(target ~ pred, data = test) %>% 
step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>% 
prep(training = test) %>% 
bake(new_data = test)

备注:transform 因为一个因子不能有的level 0。(来源

准备和烘焙后的转换数据集

# A tibble: 7 x 2
  pred  target
  <fct>  <dbl>
1 zero       0
2 one        1
3 two        0
4 three      1
5 four       1
6 five       1
7 NA         0

NA不应该在那里。它应该是“八”类。我究竟做错了什么?

备注:我也尝试了“六”,因为我认为该函数可能只接受单词中的值而不是完全随机命名的级别,但事实并非如此。

4

1 回答 1

2

您需要确保您的输入、级别和transform完美匹配。由于您transform = function(x) x + 1尝试捕获0. 因此,当您的输入为时,n将选择n+1th 的值levels

8然后当您的输入step_num2factor()返回第8+1=9th 值时,levels它不存在,因为它只有 length 7,导致NA您看到。下面的代码应该说明这个问题

library(recipes)

my_levels <- c("zero", "one", "two", "three", "four", "five", "eight")

test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 6), target = c(0,1,0,1,1,1,0))

recipe(target ~ pred, data = test) %>% 
  step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>% 
  prep() %>% 
  bake(new_data = NULL)
#> # A tibble: 7 x 2
#>   pred  target
#>   <fct>  <dbl>
#> 1 zero       0
#> 2 one        1
#> 3 two        0
#> 4 three      1
#> 5 four       1
#> 6 five       1
#> 7 eight      0

要解决您的问题,您需要确保在my_levels

test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))

my_levels <- c("zero", "one", "two", "three", "four", "five", 
               "six", "seven", "eight", "nine", "ten")

recipe(target ~ pred, data = test) %>% 
  step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>% 
  prep() %>% 
  bake(new_data = NULL)
#> # A tibble: 7 x 2
#>   pred  target
#>   <fct>  <dbl>
#> 1 zero       0
#> 2 one        1
#> 3 two        0
#> 4 three      1
#> 5 four       1
#> 6 five       1
#> 7 eight      0

reprex 包于 2021-03-27 创建(v0.3.0)

于 2021-03-27T21:04:12.290 回答