3

所以,我想计算每个数字列的平均值并将结果放在列下方的行中。让我们从一个数据开始:

> head(tbl_mut)

     timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
    1   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
    2  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
    3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
    4   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92

这就是我想要实现的目标:

timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
    1   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
    2  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
    3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
    4   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
    .....
    445    X          X          X          X          X         X           X          X

X - 列中值的平均值。

请注意,数据可能包含其他非数字列。

4

2 回答 2

6

使用rbindcolMeans

> rbind(tbl_mut, colMeans = colMeans(tbl_mut))
          timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1          4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2         45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3        639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4          4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
colMeans 173482.724  497479.54  319083.15  330634.05  331434.59 160144.458  369657.83 264901.15

编辑

假设您的数据框同时包含数字列和非数字列(如“描述”列):

> df
  Description  timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1           A   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2           B  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3           C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4           D   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92

...然后您可以使用sapply(df, is.numeric)获取数字列,然后在其上计算colmeans.

> suppressWarnings(rbind(df, colMeans = colMeans(df[, sapply(df, is.numeric)])))
         Description  timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1                  A   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2                  B  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3                  C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4                  D   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
colMeans        <NA> 497479.542  319083.15  330634.05  331434.59  160144.46 369657.833  264901.15 173482.72

或者,如果您知道非数字变量的索引,例如第一列,您可以取消选择该列df[, -1]

suppressWarnings(rbind(df, colMeans = colMeans(df[, -1]))) 
于 2013-11-05T11:28:04.007 回答
5

R 确实有一个函数addmargins可以让你做这样的事情,但它需要一个tableormatrix作为输入。

addmargins(as.matrix(mydf), 1, FUN = mean)
#       timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
# 1      4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
# 2     45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
# 3    639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
# 4      4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
# mean 173482.724  497479.54  319083.15  330634.05  331434.59 160144.458  369657.83 264901.15

更新

这里有一个几乎相同的(概念上)问题,我想我也会从那里分享我的答案。

假设我们开始:

mydf <- structure(list(Description = c("A", "B", "C", "D"), 
    timetE4_1 = c(4048.605, 45729.986, 639686.154, 4466.153), 
    Boo = structure(1:4, .Label = c("a", "b", "c", "d"), 
    class = "factor"), timetE1_2 = c(59094.48, 139889.21, 
    1764684.16, 26250.32), timetE2_2 = c(27675.59, 111309.64, 
    1117027.29, 20320.08), Baa = c(FALSE, FALSE, TRUE, NA)), 
    .Names = c("Description", "timetE4_1", "Boo", "timetE1_2", 
    "timetE2_2", "Baa"), row.names = c("1", "2", "3", "4"), 
    class = "data.frame")

mydf
#   Description  timetE4_1 Boo  timetE1_2  timetE2_2   Baa
# 1           A   4048.605   a   59094.48   27675.59 FALSE
# 2           B  45729.986   b  139889.21  111309.64 FALSE
# 3           C 639686.154   c 1764684.16 1117027.29  TRUE
# 4           D   4466.153   d   26250.32   20320.08    NA

@Jilber 的解决方案在这种情况下不起作用,并且会导致很多错位的列。相反,使用rbind.fill“plyr”。我曾经sapply在这个例子中指定我的函数,以表明很容易使用你想要的任何函数,而不仅仅是col*函数。

library(plyr)
useme <- sapply(mydf, is.numeric)
rbind.fill(mydf, data.frame(t(sapply(mydf[useme], sum))))
#   Description  timetE4_1  Boo  timetE1_2  timetE2_2   Baa
# 1           A   4048.605    a   59094.48   27675.59 FALSE
# 2           B  45729.986    b  139889.21  111309.64 FALSE
# 3           C 639686.154    c 1764684.16 1117027.29  TRUE
# 4           D   4466.153    d   26250.32   20320.08    NA
# 5        <NA> 693930.898 <NA> 1989918.17 1276332.60    NA
于 2013-11-05T11:48:08.757 回答