我想根据另一个变量的值重新调整因子变量。例如:
factors <- structure(list(color = c("RED", "GREEN", "BLUE", "YELLOW", "BROWN"
), count = c(2, 5, 11, 1, 19)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
> factors
# A tibble: 5 x 2
color count
<chr> <dbl>
1 RED 2
2 GREEN 5
3 BLUE 11
4 YELLOW 1
5 BROWN 19
这是我想要制作的:
##Group all levels with count < 10 into "OTHER"
> factors.out
# A tibble: 3 x 2
color count
<chr> <dbl>
1 OTHER 8
2 BLUE 11
3 BROWN 19
我认为这是一份工作forcats::fct_lump()
:
##Keep 3 levels
factors %>%
+ mutate(color = fct_lump(color, n = 3))
# A tibble: 5 x 2
color count
<fct> <dbl>
1 RED 2
2 GREEN 5
3 BLUE 11
4 YELLOW 1
5 BROWN 19
我知道可以通过以下方式做到这一点:
factors %>%
mutate(color = ifelse(count < 10, "OTHER", color)) %>%
group_by(color) %>%
summarise(count = sum(count))
但我认为或希望在forcats
.