3

我想用来dplyr确定数据框中的哪些观察满足以下条件:

  • 在每个Group中,Var2观测Var1 == good值的总和大于观测值的总和,其中Var1 == bad

这是玩具数据框:

library(dplyr)

set.seed(seed = 10)

df <- data.frame("Id" = 1:12,
                 "Group" = paste(sapply(toupper(letters[1:3]), rep, times = 4,simplify = T)),
                 "Var1" = sample(rep(c("good","bad"),times = 1000),size = 12),
                 "Var2" = sample(rep(1:10, times = 1000),size = 12))

print(df)

   Id Group Var1 Var2
1   1     A good    6
2   2     A  bad    9
3   3     A good   10
4   4     A good    7
5   5     B  bad    9
6   6     B  bad    1
7   7     B  bad    6
8   8     B good    6
9   9     C good    1
10 10     C  bad    8
11 11     C good    4
12 12     C  bad    2

到目前为止,我已经确定我应该使用 , 的某种组合,group_by()但是我似乎无法找到一个好的方法来做到这一点。到目前为止,这是我想出的:summarise()filter()

keepers <- df %>% 
        group_by(Group, Var1) %>%
        summarise(Total = sum(Var2)) %>% 
        print()

Source: local data frame [6 x 3]
Groups: Group [?]

  Group  Var1 Total
  (chr) (chr) (int)
1     A   bad     9
2     A  good    23
3     B   bad    16
4     B  good     6
5     C   bad    10
6     C  good     5

我应该采取哪些后续步骤?最终分析应该返回“A”,因为它是唯一一个观察值大于观察值的Group地方。Totalgoodbad

4

2 回答 2

3

如何使用spreadfilter

> library(tidyr)
> df %>% group_by(Group, Var1) %>%
+    summarise(Total = sum(Var2)) %>%
+    spread(Var1,Total) %>%
+    filter(good>bad)
Source: local data frame [1 x 3]

  Group bad good
1     A   9   23
于 2015-12-19T00:01:11.893 回答
2

与 类似的选项data.table。我们将'data.frame'转换为'data.table'(setDT(df)),按'Group','Var1'分组,得到sum'Var2',从'long'重塑为'wide'并过滤'好”大于“坏”。

library(data.table)
dcast(setDT(df)[, sum(Var2) , by = .(Group, Var1)], 
               Group~Var1, value.var='V1')[good>bad]
#   Group bad good
#1:     A   9   23
于 2015-12-19T05:27:57.190 回答