2

我有一个表格,其格式如下

Class1 0.438 0.441 0.442 0.444 0.545 0.546 0.548 0.609 0.651 0.652 0.655 
  DAWO     2     2     0     1     0     0     0     1     1     5     1  
  DRWO     1     1     3     1     1     1     1     0     0     1     0   
  DHWO     1     2     0     0     0     0     0     0     0     0     0   

我想通过根据列名合并列并添加值来减小表格的尺寸。例如

Class1    0.4   0.5   0.6 
  DAWO     5     0     8      
  DRWO     6     3     1    
  DHWO     3     0     0    

这怎么可能?在此先感谢您的帮助

4

2 回答 2

1
x <- read.table(header=TRUE, text="      0.438 0.441 0.442 0.444 0.545 0.546 0.548 0.609 0.651 0.652 0.655
DAWO     2     2     0     1     0     0     0     1     1     5     1
DRWO     1     1     3     1     1     1     1     0     0     1     0
DHWO     1     2     0     0     0     0     0     0     0     0     0   ", check.names=F)

请注意,我没有复制 text Class1,因此DAW0等是原始集中的行名。

首先,进行转置以帮助解决aggregate

tx <- as.data.frame(t(x))

这些是削减。假设值介于 0 和 1 之间。根据需要进行调整。

tx$bin <- cut(as.numeric(rownames(tx)), breaks=seq(0,1,.1))

将值相加,设置名称,然后再次转置:

xx <- aggregate(.~bin, data=tx, FUN=sum)
rownames(xx) <- xx$bin
t(xx[-1])
##      (0.4,0.5] (0.5,0.6] (0.6,0.7]
## DAWO         5         0         8
## DRWO         6         3         1
## DHWO         3         0         0
于 2013-04-16T02:03:37.843 回答
1

这是另一种选择。使用@Matthew's answer中的“x”,您可以使用strtim您的姓名创建类别,并sapply在这些类别中进行汇总。

mymatch <- strtrim(names(x), 3)
sapply(unique(mymatch), function(y) rowSums(x[, mymatch == y, drop = FALSE]))
#      0.4 0.5 0.6
# DAWO   5   0   8
# DRWO   6   3   1
# DHWO   3   0   0

或者,使用您的原始数据,您只需要小心一点,记住在获取时删除“Class1”列rowSums

mymatch <- strtrim(names(mydf), 3)[-1]
cbind(mydf[1], 
      sapply(unique(mymatch), 
             function(y) rowSums(mydf[-1][, mymatch == y, drop = FALSE])))
#   Class1 0.4 0.5 0.6
# 1   DAWO   5   0   8
# 2   DRWO   6   3   1
# 3   DHWO   3   0   0

最后,经典的“reshape2”方法涉及 amelt*cast

> library(reshape2)
> Stacked <- melt(mydf)
Using Class1 as id variables
> dcast(Stacked, Class1 ~ strtrim(variable, 3), fun.aggregate=sum)
  Class1 0.4 0.5 0.6
1   DAWO   5   0   8
2   DHWO   3   0   0
3   DRWO   6   3   1

对于最后两个示例,mydf定义为:

mydf <- structure(list(Class1 = structure(c(1L, 3L, 2L), .Label = c("DAWO", 
"DHWO", "DRWO"), class = "factor"), `0.438` = c(2L, 1L, 1L), 
    `0.441` = c(2L, 1L, 2L), `0.442` = c(0L, 3L, 0L), `0.444` = c(1L, 
    1L, 0L), `0.545` = c(0L, 1L, 0L), `0.546` = c(0L, 1L, 0L), 
    `0.548` = c(0L, 1L, 0L), `0.609` = c(1L, 0L, 0L), `0.651` = c(1L, 
    0L, 0L), `0.652` = c(5L, 1L, 0L), `0.655` = c(1L, 0L, 0L)), 
.Names = c("Class1", "0.438", "0.441", "0.442", "0.444", "0.545", "0.546", 
"0.548", "0.609", "0.651", "0.652", "0.655"), class = "data.frame", 
row.names = c(NA, -3L))
于 2013-04-16T04:49:39.450 回答