我正在尝试加入一个像这样的时期数据库:
id = c(rep(1,3), rep(2,3), rep(3,3))
start = as.Date(c("2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13", "2014-07-01", "2015-03-12", "2016-08-13"))
end = as.Date(c("2015-03-11", "2015-08-12", "2018-12-31", "2015-03-11", "2015-08-12", "2018-12-31","2015-03-11", "2015-08-12", "2018-12-31"))
DT = data.table(id, start, end)
DT
id start end
1: 1 2014-07-01 2015-03-11
2: 1 2015-03-12 2015-08-12
3: 1 2016-08-13 2018-12-31
4: 2 2014-07-01 2015-03-11
5: 2 2015-03-12 2015-08-12
6: 2 2016-08-13 2018-12-31
7: 3 2014-07-01 2015-03-11
8: 3 2015-03-12 2015-08-12
9: 3 2016-08-13 2018-12-31
具有如下临床登记(体重和身高)的一个:
id_clin = (c(rep(1,2), rep (2,3), rep(3,4)))
date = as.Date(c("2014-10-23", "2016-09-01", "2017-01-01", "2014-08-01", "2015-02-01", "2017-06-01", "2018-03-05", "2018-09-01", "2018-11-30"))
weight = c(60, 65, 62, 75, 68, 90 , 102, 104 , 98 )
height = c(160, 160, 170, 175, 170, 200, 200, 200 ,200)
DT_clin = data.table(id_clin, date, weight, height)
DT_clin
id_clin date weight height
1: 1 2014-10-23 60 160
2: 1 2016-09-01 65 160
3: 2 2017-01-01 62 170
4: 2 2014-08-01 75 175
5: 2 2015-02-01 68 170
6: 3 2017-06-01 90 200
7: 3 2018-03-05 102 200
8: 3 2018-09-01 104 200
9: 3 2018-11-30 98 200
- 当 id 的临床测量 (DT_clin) 注册表位于同一 id 的句点 (DT) 的开始和结束之间时,必须连接注册表的值。
- 如果 DT 周期之间的 DT_clin 没有值,则无需加入任何内容。
- 如果 DT 周期之间有多个值,我想计算重叠值的平均值。
期望的结果看起来像这样*:
id start end date date2 weight height
1: 1 2014-07-01 2015-03-11 2014-10-23 2014-10-23 60.0 160.0
2: 1 2015-03-12 2015-08-12 <NA> <NA> NA NA
3: 1 2016-08-13 2018-12-31 2016-09-01 2016-09-01 65.0 160.0
4: 2 2014-07-01 2015-03-11 2014-08-01 2015-02-01 71.5 172.5
5: 2 2015-03-12 2015-08-12 <NA> <NA> NA NA
6: 2 2016-08-13 2018-12-31 2017-01-01 2017-01-01 62.0 170.0
7: 3 2014-07-01 2015-03-11 <NA> <NA> NA NA
8: 3 2015-03-12 2015-08-12 <NA> <NA> NA NA
9: 3 2016-08-13 2018-12-31 2018-03-05 2018-11-30 101.3 200.0
此外,如果有一种方法可以对不同的变量进行多个操作,我也会有兴趣知道一种方法。(例如,在我加入的同时计算体重的平均值和身高的最大值)
当只有一个值时,我已经测试了 foverlaps,结果很好,但是当有多个值重叠时,我无法实现我的目标:
setkey(DT, id, start, end)
setkey(DT_clin, id_clin, date, date2)
foverlaps(DT[id == "1", ], DT_clin[id == "1",], by.x =c("id", "start", "end") , by.y = c("id_clin", "date", "date2" ), nomatch = NA )
我应该使用非等值连接吗?
预先感谢您的任何帮助 :)
*我复制了日期来创建 date2 并伪造了一个时间间隔