-1

我有一张这样的桌子:

df <- data.frame(P1 = c(1,0,0,0,0,0,"A"),
                  P2 = c(0,-2,1,2,1,0,"A"),
                  P3 = c(-1,2,0,2,1,0,"B"),
                  P4 = c(2,0,-1,0,-1,0,"B"),
                  Names = c("G1","G2","G3","G1","G2","G3","Group"),
                  stringsAsFactors = FALSE)

变成

Names    P1   P2    P3   P4
G1       1    0     -1   2
G2       0    -2    2    0
G3       0    1     0    -1
G1       0    2     2    0
G2       0    1     1    -1
G3       0    0     0    0
Group    A    A     B    B

在这里,AB是对 的变量进行分组P1, P2, P3, P4

我想为Ids( G1, G2...), Group( A, B) 和Var( -2,-1,0,1,2) 表建立一个应急方案,例如:

Id    Group Var    Count
G1    A     -2     0
G1    A     -1     0
G1    A     0      1
G1    A     1      1
G1    A     2      0
G1    B     -2     0
G1    B     -1     1
G1    B     0      0
G1    B     1      0
G1    B     2      1
G2    A     -2     1
G2    A     -1     0
G2    A     0      1
...

有没有办法在不使用大量循环的情况下在 R 中做到这一点?

4

2 回答 2

1

假设您要将P1&P2列分组为AP3&列分组P4为,您可以使用-packageB按如下方式处理它:data.table

library(data.table)
DT <- melt(melt(setDT(df),
                measure.vars = list(c(2,3),c(4,5)),
                value.name = c("A","B")),
           id = 1, measure.vars = 3:4, variable.name = 'group'
           )[order(Id,group)][, val2 := value]

DT[CJ(Id = Id, group = group, value = value, unique = TRUE)
   , on = .(Id, group, value)
   ][, .(counts = sum(!is.na(val2))), by = .(Id, group, value)]

这导致:

    Id group value counts
 1: G1     A    -2      0
 2: G1     A    -1      0
 3: G1     A     0      2
 4: G1     A     1      1
 5: G1     A     2      1
 6: G1     B    -2      0
 7: G1     B    -1      1
 8: G1     B     0      1
 9: G1     B     1      0
10: G1     B     2      2
11: G2     A    -2      1
12: G2     A    -1      0
13: G2     A     0      2
14: G2     A     1      1
15: G2     A     2      0
16: G2     B    -2      0
17: G2     B    -1      1
18: G2     B     0      1
19: G2     B     1      1
20: G2     B     2      1
21: G3     A    -2      0
22: G3     A    -1      0
23: G3     A     0      3
24: G3     A     1      1
25: G3     A     2      0
26: G3     B    -2      0
27: G3     B    -1      1
28: G3     B     0      3
29: G3     B     1      0
30: G3     B     2      0

使用数据:

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

请注意,我省略了“组”行,因为您在评论中声明这些只是为了指示P1-P4列应该属于哪些组。

于 2017-08-29T13:13:30.730 回答
1

library(tidyverse)

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

我们重塑表格并重新编码 中的P*变量group。然后我们计算并完成丢失的案例。导致 :

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n()) %>% 
  ungroup() %>% 
  complete(Id, group, v, fill = list("Count" = 0)) 

如果您不需要输出中的所有组合,只需使用:

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n())

# A tibble: 17 x 4
# Groups:   Id, group [?]
      Id    group  v     Count
      <chr> <chr>  <int> <int>
 1    G1     A     0     2
 2    G1     A     1     1
 3    G1     A     2     1
 4    G1     B    -1     1
 5    G1     B     0     1
 6    G1     B     2     2
 7    G2     A    -2     1
 8    G2     A     0     2
 9    G2     A     1     1
10    G2     B    -1     1
11    G2     B     0     1
12    G2     B     1     1
13    G2     B     2     1
14    G3     A     0     3
15    G3     A     1     1
16    G3     B    -1     1
17    G3     B     0     3
于 2017-08-29T13:31:11.423 回答