我在处理纵向数据时遇到了一些麻烦:我的数据集每行包含一个唯一 ID,然后是一系列访问日期。每次访问都有 3 个二分变量的值。
data1 <- structure(list(V1date = structure(c(2L, 1L, 2L, 3L, 4L), .Label = c("1/22/12", "4/5/12", "8/18/12", "9/6/12"), class = "factor"),
V1a = structure(c(1L, 1L, 2L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"),
V1b = structure(c(2L, 1L, 1L, 1L, 1L), .Label = c("No", "Yes"), class = "factor"),
V1c = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("No", "Yes"), class = "factor"),
V2date = structure(c(1L, 2L, 4L, 3L, NA), .Label = c("6/18/12", "7/5/12", "9/22/12", "9/4/12"), class = "factor"),
V2a = structure(c(1L, 1L, 1L, 1L, NA), .Label = "Yes", class = "factor"),
V2b = structure(c(1L, 1L, 1L, 1L, NA), .Label = "No", class = "factor"),
V2c = structure(c(1L, 1L, 1L, 1L, NA), .Label = "Yes", class = "factor"),
V3date = structure(c(NA, NA, 1L, NA, 2L), .Label = c("11/1/12", "12/4/12"), class = "factor"),
V3a = structure(c(NA, NA, 1L, NA, 1L), .Label = "Yes", class = "factor"),
V3b = structure(c(NA, NA, 1L, NA, 1L), .Label = "No", class = "factor"),
V3c = structure(c(NA, NA, 2L, NA, 1L), .Label = c("No", "Yes"), class = "factor")),
.Names = c("V1date", "V1a", "V1b", "V1c", "V2date", "V2a", "V2b", "V2c", "V3date", "V3a", "V3b", "V3c"),
class = "data.frame", row.names = c("001", "002", "003", "004", "005"))
data1
V1date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c
001 4/5/12 No Yes No 6/18/12 Yes No Yes <NA> <NA> <NA> <NA>
002 1/22/12 No No Yes 7/5/12 Yes No Yes <NA> <NA> <NA> <NA>
003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
004 8/18/12 No No No 9/22/12 Yes No Yes <NA> <NA> <NA> <NA>
005 9/6/12 Yes No No <NA> <NA> <NA> <NA> 12/4/12 Yes No No
在三个变量的 8 种不同可能组合中,4 个是“异常”,其余 4 个是“正常”。每个人都从异常开始,然后在随后的访问中继续异常,或者在以后的访问中解决为正常模式(我忽略恢复异常-一旦它们正常,它们就是正常的)
我最终必须在数据框的右侧添加 4 个新列,指示 1)上次完成访问的日期(无论干预“NAs”,2)ID 最终解决还是保持异常,3)如果解决,决议模式是什么,4)决议日期是什么。NA 总是以 4 个为一组(即没有访问日期,并且 3 个变量没有值)并且被忽略。
例如,如果模式“yes-yes-no”、“yes-no-yes”、“no-yes-yes”和“yes-yes-yes”都是正常的,而其余模式都是正常的,则结果将是以下四个附加列;
data2 <- structure(list(
LastVisDate = structure(c(3L, 2L, 3L, 3L, 2L), .Label = c("6/18/12", "12/4/12", "11/1/12", "9/22/12"), class = "factor"),
Resolved = structure(c(2L, 2L, 2L, 2L, 1L), .Label = c("No", "Yes"), class = "factor"),
Pattern = structure(c(1L, 1L, 1L, 1L, NA), .Label = "yny", class = "factor"),
Resdate = structure(c(1L, 2L, 3L, 4L, NA), .Label = c("6/18/12", "7/5/12", "9/4/12", "9/22/12"), class = "factor")),
.Names = c("LastVisDate", "Resolved", "Pattern", "Resdate"),
class = "data.frame", row.names = c("001", "002", "003", "004", "005"))
data2
LastVisDate Resolved Pattern Resdate
001 11/1/12 Yes yny 6/18/12
002 12/4/12 Yes yny 7/5/12
003 11/1/12 Yes yny 9/4/12
004 11/1/12 Yes yny 9/22/12
005 12/4/12 No <NA> <NA>
我在这个项目上花了很多时间,但是在满足我的停止规则之前,我无法弄清楚如何让 R 向右行进穿过数据集。建议非常感谢。