2

我想在我的数据框中添加一个新变量,对于每个组来说,它表示与一个变量(状态)相关的唯一条目的数量,而忽略其他变量。

数据输入

df <- data.frame(id=c(1,2,3,4,5,6,7,8,9),
                 state=c("CT","CT","AK","TX","TX","AZ","GA","TX","WA"),
                 group=c(1,1,2,3,3,3,4,4,4),
                 age=c(12,33,57,98,45,67,16,85,22)
                 )
df

期望的输出

want <- data.frame(id=c(1,2,3,4,5,6,7,8,9),
                 state=c("CT","CT","AK","TX","TX","AZ","GA","TX","WA"),
                 group=c(1,1,2,3,3,3,4,4,4),
                 age=c(12,33,57,98,45,67,16,85,22),
                 count=c(1,1,1,2,2,2,3,3,3)
                 )
want
4

1 回答 1

2

我们需要一个小组n_distinct

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(count = n_distinct(state)) %>%
  ungroup
于 2022-02-15T15:54:57.297 回答