这比预期的更冗长,但它为您完成了长格式片段的工作:
我从提供的数据开始,但使用宽而不是长。
zz <- textConnection("id time q dv
1 1 1 1
1 2 0 1
1 3 1 1
1 4 0 0
1 5 0 NA
1 6 0 NA
2 1 1 1
2 2 1 1
2 3 0 0
2 4 0 NA
2 5 0 NA
2 6 0 NA
")
d <- read.table(zz, header = TRUE)
d$dv <- NULL
close(zz)
# start out with wide instead of long
dw <- reshape(d, direction='wide', timevar="time", sep="")
dw
## id q1 q2 q3 q4 q5 q6
## 1 1 1 0 1 0 0 0
## 7 2 1 1 0 0 0 0
使用为每个宽行/观察生成适当的“dv”变量的函数。
censor <- function(periods) {
n <- length(periods)
cperiods <- periods*1:n # multiply to get positions
n.obs <- max(cperiods) # position of last q=1, and one q=0
periods[(n.obs+1):n] <- NA # NA's for periods outside observed range
n.cens <- n - n.obs - 1 # number censored
c(rep(1, n.obs-1), 0, rep(NA, n.cens+1)) # fill "dv" accordingly
}
应用所述功能,生成经过适当审查的宽数据集。
# 应用 censored(),创建宽格式的 dv 变量
dw.censored <- data.frame(t(apply(dw, 1, FUN = censor)))
dw.censored
## X1 X2 X3 X4 X5 X6 X7
## 1 1 1 1 0 NA NA NA
## 7 1 1 0 NA NA NA NA
现在回到长格式(具有美学格式、排序等)
dl.censored <- reshape(dw.censored, varying = 1:7, timevar = "time",
sep = "", direction = "long")
dl.censored <- dl.censored[order(dl.censored$id, dl.censored$time),]
dl.censored$dv <- dl.censored$X
rownames(dl.censored) <- dl.censored$X <- NULL
dl.censored
## time id dv
## 1 1 1 1
## 2 2 1 1
## 3 3 1 1
## 4 4 1 0
## 5 5 1 NA
## 6 6 1 NA
## 7 7 1 NA
## 8 1 2 1
## 9 2 2 1
## 10 3 2 0
## 11 4 2 NA
## 12 5 2 NA
## 13 6 2 NA
## 14 7 2 NA
并且没有 NA:
dl.censored <- na.omit(dl.censored) # 没有 NA
dl.censored
## time id dv
## 1 1 1 1
## 2 2 1 1
## 3 3 1 1
## 4 4 1 0
## 8 1 2 1
## 9 2 2 1
## 10 3 2 0