0

所以我是一个 R 代码初学者。在我看来,有一种快速而肮脏的方法来计算一列中一组 n 行的平均值,但是标准偏差(或标准误差)是否有类似的东西?如果可能,我想避免循环,因为这只是我正在构建的越来越笨重(对于初学者)代码的一小部分。这是我将使用的数据集的简化示例:

     Canopy Species    Date            Pa
1     Maple    BETH    4/26/2014 -0.1162607263
2     Maple    BETH    4/26/2014 -0.2742194706
3     Maple    BETH    4/26/2014 -0.1864006372
4     Maple    BETH    4/26/2014 -0.0739905518
5     Maple    BETH    4/26/2014 -0.0751169983
6     Maple    BETH    4/26/2014 -0.0782771938
7     Maple    BETH    4/26/2014 -0.1671646757
8     Maple    BETH    4/26/2014 -0.2464696338
9     Maple    BETH    4/26/2014 -0.2176720386
10    Maple    BETH    4/26/2014 -0.2283216397
11    Maple    BETH    4/26/2014 -0.1152989165
12    Maple    BETH    4/26/2014 -0.2720884764
13    Maple    BETH    4/26/2014 -0.1849383730
14    Maple    BETH    4/26/2014 -0.0734205199
15    Maple    BETH    4/26/2014 -0.0745294634
16    Maple    BETH    4/26/2014 -0.0776640601
17    Maple    BETH    4/26/2014 -0.1658603785
18    Maple    BETH    4/26/2014 -0.2445047320
19    Maple    BETH    4/26/2014 -0.2159337593
20    Maple    BETH    4/26/2014 -0.2264833266

这是我所指的示例代码。这个找到 Pa 列中每 10 行的平均值:

mu<-colMeans(matrix(Table$Pa, nrow=10))

提前感谢您的帮助,如果我应该提供更多信息,请告诉我。

4

3 回答 3

1

您也可以使用 base R 执行此操作by

> n<-nrow(Table)
> index<-ceiling((1:n)/10)
> by(Table$Pa,index,mean)
index: 1
[1] -0.1663894
------------------------------------------------------------ 
index: 2
[1] -0.1650722
> by(Table$Pa,index,sd)
index: 1
[1] 0.07604938
------------------------------------------------------------ 
index: 2
[1] 0.07544763

编辑:您可以将它们放在一个表中,例如,如下所示:

>cbind(index=unique(index),mean=by(Table$Pa,index,mean),sd=by(Table$Pa,index,sd))

  index       mean         sd
1     1 -0.1663894 0.07604938
2     2 -0.1650722 0.07544763
于 2016-02-15T15:37:43.033 回答
0

这是一个混合基础 R/dplyr 解决方案:首先我创建了一个名为 fac_to_spli 的列,它是用于计算标准偏差的因子,然后使用 group_by 和 dplyr 的 mutate 我进行了计算。

library(dplyr)
df$fac_to_spli <- sort(rep(seq(from = 1, to = nrow(df), by = 10), nrow(df) / 2 ))
df %>% group_by(fac_to_spli) %>% mutate(stand_dev = sd(Pa))

Source: local data frame [20 x 6]
Groups: fac_to_spli [2]

   Canopy Species      Date          Pa fac_to_spli  stand_dev
   (fctr)  (fctr)    (fctr)       (dbl)       (dbl)      (dbl)
1   Maple    BETH 4/26/2014 -0.11626073           1 0.07604938
2   Maple    BETH 4/26/2014 -0.27421947           1 0.07604938
3   Maple    BETH 4/26/2014 -0.18640064           1 0.07604938
4   Maple    BETH 4/26/2014 -0.07399055           1 0.07604938
5   Maple    BETH 4/26/2014 -0.07511700           1 0.07604938
6   Maple    BETH 4/26/2014 -0.07827719           1 0.07604938
7   Maple    BETH 4/26/2014 -0.16716468           1 0.07604938
8   Maple    BETH 4/26/2014 -0.24646963           1 0.07604938
9   Maple    BETH 4/26/2014 -0.21767204           1 0.07604938
10  Maple    BETH 4/26/2014 -0.22832164           1 0.07604938
11  Maple    BETH 4/26/2014 -0.11529892          11 0.07544763
12  Maple    BETH 4/26/2014 -0.27208848          11 0.07544763
13  Maple    BETH 4/26/2014 -0.18493837          11 0.07544763
14  Maple    BETH 4/26/2014 -0.07342052          11 0.07544763
15  Maple    BETH 4/26/2014 -0.07452946          11 0.07544763
16  Maple    BETH 4/26/2014 -0.07766406          11 0.07544763
17  Maple    BETH 4/26/2014 -0.16586038          11 0.07544763
18  Maple    BETH 4/26/2014 -0.24450473          11 0.07544763
19  Maple    BETH 4/26/2014 -0.21593376          11 0.07544763
20  Maple    BETH 4/26/2014 -0.22648333          11 0.07544763
于 2016-02-15T15:32:38.203 回答
0

@rawr 使用 dplyr 包在说什么:

df %>%  
mutate(id=round(row_number()/10)) %>%  
group_by(id) %>%  
summarize(mean=mean(Pa),sd=sd(Pa))  

      id     mean       sd
   (dbl)    (dbl)    (dbl)
1      0 52.00000 67.97058
2      1 32.22222 18.55921
3      2 44.54545 36.70521
4      3 23.33333 25.49510
5      4 24.54545 18.63525
6      5 58.88889 78.96905
7      6 52.72727 89.89893
8      7 31.11111 26.19372
9      8 24.54545 18.09068
10     9 50.00000 64.42049
于 2016-02-15T15:32:46.267 回答