1

我试图在mutate(across(where(is.factor)))中提供两个函数来排序因子级别并删除未使用的级别。该代码似乎没有按预期工作。哪里可能出错了?

#---- Libraries ----

library(tidyverse)

#---- Data ----

set.seed(2021)

df <- tibble(
  a1 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  a2 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  gender = gl(2, 15, labels = c("Males", "Females")),
  b2 = gl(3, 10, labels = c("Primary", "Secondary", "Tertiary", "Unknown")),
  c1 = gl(3, 10, labels = c("15-19", "20-24", "25-30", "30-35")),
  outcome = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  weight = runif(30, 1, 12)
)

#---- Problem ----

df <- df %>%
  mutate(across(where(is.factor), list(fct_infreq, fct_drop)))

levels(df$b2)

# The unused levels not dropped

4

1 回答 1

5

问题是您实际上在这里改变了两个新列,因此您将在生成的数据框中看到有两列b2_1b2_2,每列对应于应用这两个函数。

如果你运行levels(df$b2_2),你会看到你想要的输出。

如果您的目标是先删除然后重新排序,那么您需要运行连续的变异:

df <- df %>%
  mutate(across(where(is.factor), fct_drop)) %>% 
  mutate(across(where(is.factor), fct_infreq)) 
  

或在您的 mutate 中运行嵌套函数

df <- df %>%
  mutate(across(where(is.factor), ~fct_infreq(fct_drop(.x))))
于 2021-04-13T10:22:05.683 回答