您可以在坚持使用基本 R 的 、 和 函数的同时做到reshape()
这aggregate()
一点merge()
。
这是一个最小的例子:
首先,一些示例数据:
set.seed(1) # So you can get the same results that I do
myDF <- data.frame(id = rep(c("a", "b", "c"), each = 48),
datestamp = rep(c("20120101", "20120102"), each = 24),
hrofday = rep(0:23, times = 6),
val1 = runif(144, min = 0, max = 10),
val2 = runif(144, min = 5, max = 15),
val3 = runif(144, min = 0, max = 5))
list(head(myDF), tail(myDF))
# [[1]]
# id datestamp hrofday val1 val2 val3
# 1 a 20120101 0 2.655087 12.293096 0.6611409
# 2 a 20120101 1 3.721239 9.525708 1.1065296
# 3 a 20120101 2 5.728534 6.751268 1.1319040
# 4 a 20120101 3 9.082078 12.466983 0.6570827
# 5 a 20120101 4 2.016819 6.049876 4.9078173
# 6 a 20120101 5 8.983897 13.645449 1.6350686
#
# [[2]]
# id datestamp hrofday val1 val2 val3
# 139 c 20120102 18 9.850952 13.803191 0.4265550
# 140 c 20120102 19 5.076418 8.730634 4.6628596
# 141 c 20120102 20 6.827881 5.479591 4.1919203
# 142 c 20120102 21 6.015412 6.386282 4.3971665
# 143 c 20120102 22 2.388687 8.214921 4.6785623
# 144 c 20120102 23 2.581659 6.548316 0.3623032
#
二、创建要合并的对象:
## Use `aggregate` to get the totals for `val2` and `val3`. I used the
## `list` structure to be able to define my desired column names
myAggregates <- aggregate(list(total.val2 = myDF$val2, total.val3 = myDF$val3),
list(id = myDF$id, datestamp = myDF$datestamp),
sum, na.rm = TRUE)
myAggregates
# id datestamp total.val2 total.val3
# 1 a 20120101 229.0276 46.44113
# 2 b 20120101 234.9122 61.15198
# 3 c 20120101 238.5162 61.95309
# 4 a 20120102 269.6523 70.49336
# 5 b 20120102 238.5868 61.07377
# 6 c 20120102 198.4762 67.97553
## Use `reshape()` to change from long to wide. Drop `val2` and `val3`
## before reshaping (can be done many ways, I did it here by name matching)
myDFwide <- reshape(myDF[!names(myDF) %in% c("val2", "val3")], direction="wide",
idvar=c("id", "datestamp"), timevar="hrofday")
第三,merge()
用来组合这两个data.frame
s。我已经发布了 的输出,str()
因此您可以看到变量名称和它们包含的内容类型。
myDF2 <- merge(myDFwide, myAggregates)
str(myDF2)
# 'data.frame': 6 obs. of 28 variables:
# $ id : Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3
# $ datestamp : Factor w/ 2 levels "20120101","20120102": 1 2 1 2 1 2
# $ val1.0 : num 2.66 2.67 7.32 3.47 4.55 ...
# $ val1.1 : num 3.72 3.86 6.93 3.34 4.1 ...
# $ val1.2 : num 5.729 0.134 4.776 4.764 8.109 ...
# $ val1.3 : num 9.08 3.82 8.61 8.92 6.05 ...
# $ val1.4 : num 2.02 8.7 4.38 8.64 6.55 ...
# $ val1.5 : num 8.98 3.4 2.45 3.9 3.53 ...
# $ val1.6 : num 9.447 4.821 0.707 7.773 2.703 ...
# $ val1.7 : num 6.608 5.996 0.995 9.606 9.927 ...
# $ val1.8 : num 6.29 4.94 3.16 4.35 6.33 ...
# $ val1.9 : num 0.618 1.862 5.186 7.125 2.132 ...
# $ val1.10 : num 2.06 8.27 6.62 4 1.29 ...
# $ val1.11 : num 1.77 6.68 4.07 3.25 4.78 ...
# $ val1.12 : num 6.87 7.94 9.13 7.57 9.24 ...
# $ val1.13 : num 3.84 1.08 2.94 2.03 5.99 ...
# $ val1.14 : num 7.7 7.24 4.59 7.11 9.76 ...
# $ val1.15 : num 4.98 4.11 3.32 1.22 7.32 ...
# $ val1.16 : num 7.18 8.21 6.51 2.45 3.57 ...
# $ val1.17 : num 9.92 6.47 2.58 1.43 4.31 ...
# $ val1.18 : num 3.8 7.83 4.79 2.4 1.48 ...
# $ val1.19 : num 7.774 5.53 7.663 0.589 0.131 ...
# $ val1.20 : num 9.347 5.297 0.842 6.423 7.156 ...
# $ val1.21 : num 2.12 7.89 8.75 8.76 1.03 ...
# $ val1.22 : num 6.517 0.233 3.391 7.789 4.463 ...
# $ val1.23 : num 1.26 4.77 8.39 7.97 6.4 ...
# $ total.val2: num 229 270 235 239 239 ...
# $ total.val3: num 46.4 70.5 61.2 61.1 62 ...