-3

我有如下数据集,我将其作为 csv 文件读取并将数据帧加载为df

Name   Value1 Value1    
A       2       5       
A       1       5       
B       3       4       
B       1       4       
C       0       3       
C       5       3       
C       1       3       

如果我在 R 中执行以下命令,

out<-ddply(df, .(Name), summarize,  Value1=mean(Value1),Value2=mean(Value2))

我得到这样的输出,

Name   Value1_mean   Value2_mean    
A       1.5             5       
B       2               4       
C       2               3       

但是需要找到Value2Value1的平均值并将结果存储在一个单独的列中,比如每个条目的value1_meanvalue2_mean ,

Name   Value1 Value1   value1_mean  value2_mean
A       2       5       1.5           5
A       1       5       1.5           5
B       3       4       2             4
B       1       4       2             4
C       0       3       2             3
C       5       3       2             3
C       1       3       2             3

我怎样才能得到上面的输出?

4

1 回答 1

1

我们可以使用data.table. 将“data.frame”转换为“data.table”(setDT(df)),按“名称”分组,指定要使用的列mean.SDcols循环遍历data.table(.SD)的子集,获取mean并分配(:=)它到新的列。

library(data.table)
setDT(df)[, paste0(names(df)[2:3], "_mean") := lapply(.SD, mean), by = Name, .SDcols = 2:3]
df
#    Name Value1 Value2 Value1_mean Value2_mean
#1:    A      2      5         1.5           5
#2:    A      1      5         1.5           5
#3:    B      3      4         2.0           4
#4:    B      1      4         2.0           4
#5:    C      0      3         2.0           3
#6:    C      5      3         2.0           3
#7:    C      1      3         2.0           3

或者dplyr,我们使用mutate_each

library(dplyr)
df %>%
   group_by(Name) %>%
   mutate_each(funs(Mean = mean)) 
#    Name Value1 Value2 Value1_Mean Value2_Mean
#  <chr>  <int>  <int>       <dbl>       <dbl>
#1     A      2      5         1.5           5
#2     A      1      5         1.5           5
#3     B      3      4         2.0           4
#4     B      1      4         2.0           4
#5     C      0      3         2.0           3
#6     C      5      3         2.0           3
#7     C      1      3         2.0           3

数据

df <- structure(list(Name = c("A", "A", "B", "B", "C", "C", "C"), Value1 = c(2L, 
1L, 3L, 1L, 0L, 5L, 1L), Value2 = c(5L, 5L, 4L, 4L, 3L, 3L, 3L
)), .Names = c("Name", "Value1", "Value2"), class = "data.frame", 
row.names = c(NA, -7L))
于 2017-01-03T08:05:38.783 回答