我对这个问题有跟进。
我正在创建一个以现有 data.frame 的列名和特定行条目为条件的 data.frame。下面是我使用for 循环解决它的方法(感谢@Roland 的建议……真实数据违反了@eddi 回答的要求),但它一直在实际数据集(200x500,000+ rows.cols)上运行现在两个多小时...
(以下生成的 data.frames 与实际数据非常相似。)
set.seed(1)
a <- data.frame(year=c(1986:1990),
events=round(runif(5,0,5),digits=2))
b <- data.frame(year=c(rep(1986:1990,each=2,length.out=40),1986:1990),
region=c(rep(c("x","y"),10),rep(c("y","z"),10),rep("y",5)),
state=c(rep(c("NY","PA","NC","FL"),each=10),rep("AL",5)),
events=round(runif(45,0,5),digits=2))
d <- matrix(rbinom(200,1,0.5),10,20, dimnames=list(c(1:10), rep(1986:1990,each=4)))
e <- data.frame(id=sprintf("%02d",1:10), as.data.frame(d),
region=c("x","y","x","z","z","y","y","z","y","y"),
state=c("PA","AL","NY","NC","NC","NC","FL","FL","AL","AL"))
for (i in seq_len(nrow(d))) {
for (j in seq_len(ncol(d))) {
d[i,j] <- ifelse(d[i,j]==0,
a$events[a$year==colnames(d)[j]],
b$events[b$year==colnames(d)[j] &
b$state==e$state[i] &
b$region==e$region[i]])
}
}
有没有更好/更快的方法来做到这一点?