r - R rbind rows to df with results calculated by group for multiple columns by name

Question

How can I group by "id", apply some arithmetic functions to the last four columns (by group), and add new rows to the df containing the results. Here is an illustrative example with 5 samples (id) and 8 columns:

    d1   d2   id  type         treat  v1_gm  v2_pct v3_pct
1   info info 1   leaf         NA     0.2    70     90
2   info info 1   flower       A      0.5    80     80
3   info info 2   leaf         NA     0.4    65     80
4   info info 2   flower       A      0.1    90     90
5   info info 3   leaf         NA     0.6    55     80
6   info info 3   stem         A      0.3    80     30
7   info info 4   leaf         NA     0.6    30     40
8   info info 4   flower       A      0.7    75     75
9   info info 5   leaf/stem    NA     0.8    80     75

Reproducible example:

df <- data.frame(matrix(NA, nrow = 9, ncol = 8), row.names=NULL)
colnames(df) <- c("d1","d2","id","type","treat","v1_gm","v2_pct","v3_pct")
df$d1 <- "info"
df$d2 <- "info"
id <- c(1,1,2,2,3,3,4,4,5)
df$id <- c(1,1,2,2,3,3,4,4,5)
df$type <- c("leaf","flower","leaf","flower","leaf","stem","leaf","flower","leaf/stem")
df$treat <- c(NA,"A",NA,"A",NA,"A",NA,"A",NA)
df$v1_gm <- c(0.2,0.5,0.4,0.1,0.6,0.3,0.6,0.7,0.8)
df$v2_pct <- c(70,80,65,90,55,80,30,75,80)
df$v3_pct <- c(90,80,80,90,80,30,40,75,75)

The result table should look something like the following. Rows 3, 6, 9, and 13 are the new rows containing the results. The new rows can be appended at the end of the table, or put in a tmp df to be added later with rbind (I can't figure out how to do it, either way). The grouping var is column "id". The function sum is used for "v1_gm". The function "mean" is used for multiple consecutive columns, here "v1_pct" and "v3_pct", that should be called by name (e.g., v1_pct:v3_pct). The value for "type" in the new rows is concatenated from "type" in the group rows, "d1" and "d2" are simply copied from the group row where treat=="A", and "treat" in the new row is assigned the value "cmb".

    d1   d2   id type         treat v1_gm  v2_pct  v3_pct
1   info info 1  leaf         NA    0.2    70      90
2   info info 1  flower       A     0.5    80      80
3   info info 1  leaf/flower  cmb   0.7    75      85
4   info info 2  leaf         NA    0.4    65      80
5   info info 2  flower       A     0.1    90      90
6   info info 2  leaf/flower  cmb   0.5    77.5    85
7   info info 3  leaf         NA    0.6    55      80
8   info info 3  stem         A     0.3    80      30
9   info info 3  leaf/stem    cmb   0.9    67.5    55
10  info info 4  leaf         NA    0.6    30      40
11  info info 4  flower       A     0.7    75      75
13  info info 4  leaf/flower  cmb   1.3    52.5    57.5
14  info info 5  leaf/stem    NA    0.8    80      75

score 1 · Accepted Answer

I'm unsure if you can add the group summaries as a row to the dataframe. You should be able to do it as a column.

library("dplyr")
res1 <- df %>% group_by(id) %>% mutate( sumV1 = sum(v1_gm),meanV2 = mean(v2_pct),meanV3 = mean(v3_pct),gr_type = paste(type,collapse="/")) %>% filter(treat == "A") %>% select(d1,d2,id,type,v1_gm=sumV1, v2_pct = meanV2, v3_pct = meanV3,type = gr_type)

this will give you the answers and then use bind_rows you'll get your desired result

final_res <- bind_rows(df,res1)

score 0 · Accepted Answer

With some modification of @Arun 's answer, the following script solves the problem completely.

library("dplyr")
res1 <- df %>%  
  group_by(id) %>%  
  mutate(  
    v1_gm = sum(v1_gm),  
    v2_pct = mean(v2_pct),  
    v3_pct = mean(v3_pct),  
    type = paste(type,collapse="/")) %>%  
  filter(treat == "A") %>%  
  mutate(treat = as.character("calculated"))  
final_res1 <- bind_rows(df,res1)  
final_res1$id <- as.character(final_res1$id)  
final_res1 <- final_res1 [order(final_res1$id, final_res1$treat, na.last=FALSE),  ]

r - R rbind rows to df with results calculated by group for multiple columns by name

2 回答 2

Related

Reference