1

我有一个包含两个因子列的表,我想将它们聚合成一个易于热图映射的表。

该表具有例如具有以下格式

 City         Date           Revenue     Costs       Manager 
 ____         ____            _______    ______       ___
 New York     Feb 1           2000        200        Stuart
 San Fran     Feb 3           1200        300        John
 Boston       Feb 1           1500        400        Mike
 Boston       Feb 1           1300        200        Cissy

等等

我想按收入有一个这种格式的二维聚合表

Sum Revenue  New York     San Fran     Boston   
 ____         ____           ____       ____
 Feb 1        2000             0        2800
 Feb 2          0              0          0
 Feb 3          0             1200        0  

有没有一种简单的方法可以做到这一点,还是我被困在循环中?

4

1 回答 1

3

正如@Arun 在评论中建议的那样,reshape将为您执行此操作。

d<-read.table(text='City         Date           Revenue     Costs
"New York"     "Feb 1"           2000        200
"San Fran"     "Feb 3"           1200        300
Boston       "Feb 1"           1500        400', header=TRUE)
reshape(d[! names(d) %in% 'Costs'], idvar='Date', timevar='City', direction='wide')
#    Date Revenue.New York Revenue.San Fran Revenue.Boston
# 1 Feb 1             2000               NA           1500
# 2 Feb 3               NA             1200             NA

如果您想首先组合多个城市/日期条目,您可以使用aggregate.

d<-read.table(text='City         Date           Revenue     Costs
"New York"     "Feb 1"           2000        200
"New York"     "Feb 1"           1000        100
"San Fran"     "Feb 3"           1200        300
Boston       "Feb 1"           1500        400', header=TRUE)
dd<-with(d, aggregate(Revenue, by=list(City=City, Date=Date), sum))
#     City     Date  x
# 1   Boston   Feb 1 1500
# 2 New York   Feb 1 3000
# 3 San Fran   Feb 3 1200
ddd<-reshape(dd, idvar='Date', timevar='City', direction='wide')
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000         NA
# 3 Feb 3       NA         NA       1200

然后将NAs替换为0

ddd[is.na(ddd)] <- 0
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000          0
# 3 Feb 3        0          0       1200

为了解决@Arun 在下面提出的问题,在上一步之前,您可以使用该merge函数填写缺失的日期。

missing.Dates <- c('Feb 2')
ddd<-merge(ddd, data.frame(Date=missing.Dates), by='Date', all=TRUE)
#   Date x.Boston x.New York x.San Fran
#1 Feb 1     1500       3000         NA
#2 Feb 3       NA         NA       1200
#3 Feb 2       NA         NA         NA
ddd[is.na(ddd)] <- 0
#    Date x.Boston x.New York x.San Fran
# 1 Feb 1     1500       3000          0
# 2 Feb 3        0          0       1200
# 3 Feb 2        0          0          0
于 2013-03-22T13:57:10.410 回答