1

这可能已经被问过了,但我找不到它。我有一个数据集,其中列名是数字,行名是样本名(见下文)。

"599.773" "599.781" "599.789" "599.797" "599.804" "599.812" "599.82" "599.828" 
"A" 0 0 0 0 0 2 1 4  
"B" 0 0 0 0 0 1 0 3  
"C" 0 0 0 0 2 1 0 1  
"D" 3 0 0 0 3 1 0 0 

我想按总和对列进行分箱,例如每 4 列,然后用分箱列的平均值命名新列。对于上表,我最终会得到:

"599.785" "599.816" 
"A" 0 7 
"B" 0 4  
"C" 0 4  
"D" 3 4 

新列名 599.785 和 599.816 是已分箱的列名的平均值。我认为像 cut 这样的东西适用于数字向量,但我不确定如何为大型数据帧实现它。谢谢你的帮助!

4

2 回答 2

0

首先,使用数值作为列名不是一个好/标准的习惯。

即使我在这里给出一个解决方案作为所需的 OP。

## read data without checking names 
dt <- read.table(text='
"599.773" "599.781" "599.789" "599.797" "599.804" "599.812" "599.82" "599.828" 
"A" 0 0 0 0 0 2 1 4  
"B" 0 0 0 0 0 1 0 3  
"C" 0 0 0 0 2 1 0 1  
"D" 3 0 0 0 3 1 0 0',header=TRUE, check.names =FALSE)

cols <- as.numeric(colnames(dt))
## create a factor to groups columns
ff   <- rep(c(TRUE,FALSE),each=length(cols)/2)
## using tapply to group operations by ff 
vals <- do.call(cbind,tapply(cols,ff,
       function(x)
         rowSums(dt[,paste0(x)])))
nn <- tapply(cols,ff,mean)
## names columns with means
colnames(vals) <- nn[colnames(vals)]

vals
  599.816 599.785
A       7       0
B       4       0
C       4       0
D       4       3
于 2013-07-11T18:14:51.027 回答
0
colnames <- c("599.773", "599.781", "599.789", "599.797", 
              "599.804", "599.812" ,"599.82" ,"599.828" )
mat <- matrix(scan(), nrow=4, byrow=TRUE)
 0 0 0 0 0 2 1 4  
  0 0 0 0 0 1 0 3  
  0 0 0 0 2 1 0 1  
  3 0 0 0 3 1 0 0 

 colnames(mat)=colnames
 rownames(mat) = LETTERS[1:4]

 sRows <- function(mat, cols) rowSums(mat[, cols])
 sapply(1:(dim(mat)[2]/4), function(base) sRows(mat, base:(base+4)) )

  [,1] [,2]
A    0    2
B    0    1
C    2    3
D    6    4

accum <- sapply(1:(dim(mat)[2]/4), function(base) 
                      sRows(mat, base:(base+4)) )
colnames(accum) <- sapply(1:(dim(mat)[2]/4), 
                          function(base)   
                      mean(as.numeric(colnames(mat)[ base:(base+4)] )) )
accum
#-------
  599.7888 599.7966
A        0        2
B        0        1
C        2        3
D        6        4
于 2013-07-11T17:57:11.037 回答