我想用来dplyr
确定数据框中的哪些观察满足以下条件:
- 在每个
Group
中,Var2
观测Var1 == good
值的总和大于观测值的总和,其中Var1 == bad
这是玩具数据框:
library(dplyr)
set.seed(seed = 10)
df <- data.frame("Id" = 1:12,
"Group" = paste(sapply(toupper(letters[1:3]), rep, times = 4,simplify = T)),
"Var1" = sample(rep(c("good","bad"),times = 1000),size = 12),
"Var2" = sample(rep(1:10, times = 1000),size = 12))
print(df)
Id Group Var1 Var2
1 1 A good 6
2 2 A bad 9
3 3 A good 10
4 4 A good 7
5 5 B bad 9
6 6 B bad 1
7 7 B bad 6
8 8 B good 6
9 9 C good 1
10 10 C bad 8
11 11 C good 4
12 12 C bad 2
到目前为止,我已经确定我应该使用 , 的某种组合,group_by()
但是我似乎无法找到一个好的方法来做到这一点。到目前为止,这是我想出的:summarise()
filter()
keepers <- df %>%
group_by(Group, Var1) %>%
summarise(Total = sum(Var2)) %>%
print()
Source: local data frame [6 x 3]
Groups: Group [?]
Group Var1 Total
(chr) (chr) (int)
1 A bad 9
2 A good 23
3 B bad 16
4 B good 6
5 C bad 10
6 C good 5
我应该采取哪些后续步骤?最终分析应该返回“A”,因为它是唯一一个观察值大于观察值的Group
地方。Total
good
bad