我正在尝试使用 as_factor() 按照它们在基础数据中出现的顺序创建因子级别。该函数在底层数据是字符的情况下工作正常,但在它们是数字的情况下不能正常工作。
我尝试了 as_factor 文档中给出的示例代码。它使用底层字符变量,并给出与底层变量相同的顺序;这就是 as_factor 应该做的。但是对于数值变量,顺序是排序的,并且 as.factor 和 as_factor 给出相同的顺序。
# anomaly with as_factor -- is this a feature or a bug?
# Bill Anderson August 2019
require(tidyverse)
#> Loading required package: tidyverse
#> Warning: package 'tidyverse' was built under R version 3.6.1
#> Warning: package 'dplyr' was built under R version 3.6.1
# example from as_factor documentation
x <- c("a", "z", "g")
as_factor(x) # preserves input order, as desired
#> [1] a z g
#> Levels: a z g
as.factor(x) # factor levels obtained by sorting data
#> [1] a z g
#> Levels: a g z
# numeric example
y <- c(1, 3, 2)
as_factor(y) # factor levels obtained by sorting data -- not what I expected
#> [1] 1 3 2
#> Levels: 1 2 3
as.factor(y) # factor levels obtained by sorting data
#> [1] 1 3 2
#> Levels: 1 2 3
identical(as_factor(y), as.factor(y))
#> [1] TRUE
# explicit character conversion
z <- as.character(y)
as_factor(z) # preserves input order, as desired
#> [1] 1 3 2
#> Levels: 1 3 2
as.factor(z) # factor levels obtained by sorting data
#> [1] 1 3 2
#> Levels: 1 2 3
# one can also put everything into a data frame,
# so we can see the impact of the factor order is clearly visible
mtcars %>% group_by(cyl) %>%
summarize(meandisp = mean(disp)) # cylinder order is sorted
#> # A tibble: 3 x 2
#> cyl meandisp
#> <dbl> <dbl>
#> 1 4 105.
#> 2 6 183.
#> 3 8 353.
mtcars %>% group_by(as_factor(cyl)) %>%
summarize(meandisp = mean(disp)) # cylinder order is still sorted
#> # A tibble: 3 x 2
#> `as_factor(cyl)` meandisp
#> <fct> <dbl>
#> 1 4 105.
#> 2 6 183.
#> 3 8 353.
mtcars %>% group_by(as_factor(as.character(cyl))) %>%
summarize(meandisp = mean(disp)) # cylinder order follows data
#> # A tibble: 3 x 2
#> `as_factor(as.character(cyl))` meandisp
#> <fct> <dbl>
#> 1 6 183.
#> 2 4 105.
#> 3 8 353.
由reprex 包(v0.3.0)于 2019 年 8 月 15 日创建
没有错误消息。问题很简单,除非我明确转换为字符,否则我没有得到基础数据的顺序。
我不确定这种情况是功能还是错误。但如果它是一个特性,我建议它应该出现在函数文档中。