我有一个包含三个数据框的列表,并希望生成另一个包含三个数据框的列表,其行由分组变量 (g1) 的每个值和 g1 变量的六个变量的平均值组成。扭曲的是,我只想在相应的虚拟变量的值等于 1 时计算三个连续变量的均值。
可重现的例子:
a <- data.frame(c("fj","fj","fj","a","fj","a","g","g","g","g"),c(1,1,1,1,0,0,0,1,0,0),c(0,0,1,0,1,0,0,1,0,1),c(0,0,0,1,0,0,1,1,0,0),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)))
b <- data.frame(c("fj","a","fj","a","fj","fj","fj","g","g","g"),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)))
c <- data.frame(c("fj","fj","fj","a","fj","a","g","g","g","g"),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 0, max = 2)),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)),floor(runif(10, min = 10, max = 200)))
u <- list(a,b,c)
u <- lapply(u, setNames, nm = c('g1','dummy1','dummy2','dummy3','contin1','contin2','contin3'))
u[[1]]
> u
[[1]]
g1 dummy1 dummy2 dummy3 contin1 contin2 contin3
1 fj 1 0 0 199 18 61
2 fj 1 0 0 91 158 28
3 fj 1 1 0 147 67 190
4 a 1 0 1 181 105 22
5 fj 0 1 0 14 16 156
6 a 0 0 0 178 14 98
7 g 0 0 1 116 97 30
8 g 1 1 1 48 31 144
9 g 0 0 0 60 21 112
10 g 0 1 0 95 145 199
我想仅在 dummy1 = 1 时计算 contin1 的平均值,仅在 dummy2 = 1 时计算 contin2 的平均值,仅在 dummy3 = 1 时计算 contin3 的平均值
我想要的第一个列表的输出:
> rates
[[1]]
x[, 1] V1 V2 V3 x[, 1] x[, 6] x[, 1] x[, 7] x[, 1] x[, 8]
1 a 0.50 0.0 0.5 a 181 a NA a 22
2 fj 0.75 0.5 0.0 fj 145.67 fj 41.5 fj NA
3 g 0.25 0.5 0.5 g 48 g 88 g 87
我试过的:
rates <- lapply(u, function(x) {
cbind(aggregate(cbind(x[,2],x[,3],x[,4]) ~ x[,1], FUN = mean, na.action = NULL),
aggregate(x[,6] ~ x[,1], FUN = mean, na.action = NULL, subset = (x[,2] == 1)),
aggregate(x[,7] ~ x[,1], FUN = mean, na.action = NULL, subset = (x[,3] == 1)),
aggregate(x[,8] ~ x[,1], FUN = mean, na.action = NULL, subset = (x[,4] == 1)))
})
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 3, 2
我知道此错误来自 cbind,因为每当您尝试 cbind 具有不同行数的对象时,cbind 都会失败。(列 x[, 6] 有三行,而 x[, 7] 和 x[, 8] 有两行。)我想我希望聚合有某种方法可以为每个分组变量保留一行,这意味着我将拥有相同数量的行并且 cbind 会起作用。根据 R 文档,这可能是不可能的吗?:“结果中将省略任何 by 变量中缺少值的行。”
我已经轻快地阅读了聚合的文档。以下两篇文章解决了类似的问题,但没有使用不同的数据子集来计算均值。
R:计算组子集的均值 和 R中数据帧列表的均值
任何建议将不胜感激。