0

I am trying to apply a certain function to groups of columns from a data frame based upon a 'design' vector containing the column indices that are part of the same experimental design 'group' (i.e. replicates). My observations are the rows, my sampling points are the columns.
The design vector designates which columns should group together:

designvector <- c(rep(1,2), rep(2,3), rep(3,3), rep(4,2), rep(5,2), rep(6,2), 
                       rep(7,2), rep(8,2), rep(9,2))

A small example of the data frame to which I want to apply the function is:

structure(list(`1` = c(4381L, 608L, 7648L, 458L, 350L, 203L), 
`1` = c(6450L, 1389L, 4896L, 526L, 920L, 352L), `2` = c(1966L, 
59L, 492L, 5291L, 1401L, 133L), `2` = c(6338L, 281L, 2649L, 
4718L, 1281L, 377L), `2` = c(12399L, 578L, 3094L, 1787L, 
1180L, 541L), `3` = c(9629L, 554L, 7299L, 2819L, 1314L, 497L
), `3` = c(11329L, 709L, 3720L, 2909L, 1929L, 655L), `3` = c(11319L, 
535L, 5212L, 2191L, 1239L, 633L), `4` = c(7427L, 8637L, 894L, 
2L, 782L, 120L), `4` = c(6748L, 9139L, 431L, 28L, 871L, 224L
), `5` = c(7125L, 11819L, 1728L, 9L, 607L, 313L), `5` = c(8651L, 
11022L, 442L, 96L, 728L, 249L), `6` = c(17879L, 3402L, 319L, 
6L, 1226L, 489L), `6` = c(20859L, 2648L, 463L, 10L, 1189L, 
408L), `7` = c(13457L, 1124L, 9386L, 18L, 635L, 367L), `7` = c(16292L, 
1732L, 6552L, 20L, 1022L, 431L), `8` = c(9035L, 5887L, 185L, 
11L, 550L, 1814L), `8` = c(14831L, 5833L, 570L, 8L, 1089L, 
1462L), `9` = c(22023L, 2254L, 5212L, 63L, 555L, 1254L), 
`9` = c(16887L, 2491L, 4949L, 68L, 921L, 983L)), .Names = c("1", 
"1", "2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "6", "6", 
"7", "7", "8", "8", "9", "9"), row.names = c(NA, 6L), class = "data.frame")

However, using ddply I get an error which I do not really understand: ddply(abmat.sum,.(designvector),mean) gives the following output:

designvector V1
1            1 NA
2            2 NA
3            3 NA
4            4 NA
5            5 NA
6            6 NA
7            7 NA
8            8 NA
9            9 NA
Warning messages:
1: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
5: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
6: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
7: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
8: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA
9: In mean.default(piece, ...) :
  argument is not numeric or logical: returning NA

I am clueless as to what I am doing wrong here. Any suggestions using ddply or other methods then for-looping over the dataframe are welcome.

4

1 回答 1

1

问题是它abmat.sum的形式错误(它是“宽”而不是“长”,正如 所要求的那样ddply)。用来melt解决这个问题。

library(reshape2)
abmat.sum_long <- melt(abmat.sum)
abmat.sum_long$variable <- as.numeric(abmat.sum_long$variable)

您还需要传递summariseddply.

library(plyr)
ddply(abmat.sum_long, .(variable), summarise, mean_value = mean(value))
于 2013-07-01T13:36:43.073 回答