0

我有一个更小的不应该折叠的级别列表(“阿尔伯塔省”、“不列颠哥伦比亚省”、“安大略省”、“魁北克省”),而不是应该(所有其他)。我无法否定 fct_collapse 的级别(作为目标示例的代码)(除以下之外的所有级别)。有什么建议么?

df$`Province group` %<>% fct_collapse(df$Province, `Smaller provinces` = !c("Alberta", "British Columbia", "Ontario", "Quebec"))

4

3 回答 3

1

我对您在此处使用的某些语法感到有些困惑,但是此解决方案应该对您有用!它使用 dplyr 的管道结构,并在变量名中使用下划线代替空格(即 variable_name 而不是 `variable name`)

    library(dplyr)
    library(forcats)

    #What I imagine your df$Province variable looks like
    df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))

    #Define your big provinces in this vector
    big_provinces <- c("Ontario", "Alberta", "Quebec", "British Columbia")

    #Modify the dataset (i.e. do the fct_collapse)
    df %>%
      mutate(Province_group =  fct_collapse(
                 Province, #For the variable "Province"
                 "Smaller provinces" = unique(Province[!(Province %in% big_provinces)]) #"Smaller provinces" is any province not in the vector big_province.
                 ) #end of fct_collapse
             ) #mutate

如果“Provinces”是因子变量,则需要先将其转换为字符变量。

PS你好来自魁北克

于 2020-06-15T17:54:51.823 回答
1

fct_lump是这个问题的最佳解决方案(只是因为问题的逻辑是否定 4 个大 n 省)。如果有人找到比 Rui Barradas 更短的解决方案,我仍然会对未来的因子工作感兴趣。

df%>%
  mutate(`Compared to smaller provinces` = fct_lump(Province, n = 4)) %>%
  count(`Compared to smaller provinces`)

这会产生 5 个组,其中“其他”是所有其他较小的 n 响应省份。

于 2020-07-04T19:22:04.470 回答
0

这是levels获得因子水平的解决方案。然后,通过取反来对不折叠的值进行子集化%in%

首先在用户@R me matey的回答中重新创建数据集。

library(magrittr)
library(dplyr)
library(forcats)

df <- tibble(Province = rep(c("Ontario", "Alberta", "Quebec", "British Columbia", "PEI", "Manitoba", "Nova Scotia"), 10))
df$Province <- factor(df$Province)

现在的问题。

big_provinces <- c("Alberta", "British Columbia", "Ontario", "Quebec")

df %<>%
  mutate(Province = fct_collapse(Province, `Smaller provinces` = levels(Province)[!levels(Province) %in% big_provinces]))

df
## A tibble: 70 x 1
#   Province         
#   <fct>            
# 1 Ontario          
# 2 Alberta          
# 3 Quebec           
# 4 British Columbia 
# 5 Smaller provinces
# 6 Smaller provinces
# 7 Smaller provinces
# 8 Ontario          
# 9 Alberta          
#10 Quebec           
## ... with 60 more rows
于 2020-06-15T19:53:49.883 回答