与我在这里提出的问题相关:查找日期是否在多对矢量化日期之间重叠
初始数据示例:
person start_loc start_date end_date.1 end_date.2 end_date.3 end_loc.1 end_loc.2 end_loc.3
1 1 a 2021-02-10 2021-02-17 <NA> <NA> g
2 2 a 2021-01-30 2020-09-29 2020-12-12 2021-02-04 a a g
3 3 g 2020-12-04 <NA> <NA> <NA>
4 4 r 2020-12-09 2020-12-12 2020-12-14 2021-01-05 c c g
5 5 t 2021-03-22 2021-03-25 2021-03-29 <NA> b t
6 6 b 2021-04-04 2021-04-07 2021-04-09 <NA> b t
example <- structure(list(person = 1:6, start_loc = c("a", "a", "g", "r",
"t", "b"), start_date = structure(c(18668, 18657, 18600, 18605,
18708, 18721), class = "Date"), end_date.1 = structure(c(18675,
18534, NA, 18608, 18711, 18724), class = "Date"), end_date.2 = structure(c(NA,
18608, NA, 18610, 18715, 18726), class = "Date"), end_date.3 = structure(c(NA,
18662, NA, 18632, NA, NA), class = "Date"), end_loc.1 = c("g",
"a", "", "c", "b", "b"), end_loc.2 = c("", "a", "", "c", "t",
"t"), end_loc.3 = c("", "g", "", "g", "", "")), class = "data.frame", row.names = c(NA,
-6L))
我的数据是这样排列的,因此每个person
和 a都有行start_date
,还有start_loc
. 我想知道哪些人有
end_date
的 7 天内start_date
,- 并且如果有两对或更多对符合此条件,则优先考虑
end_loc
匹配的那些start_loc
- 否则最早。
所以所需的输出看起来像:
person start_loc start_date end_date.1 end_date.2 end_date.3 end_loc.1 end_loc.2 end_loc.3 end_date end_loc
1 1 a 2021-02-10 2021-02-17 <NA> <NA> g 2021-02-17 g
2 2 a 2021-01-30 2020-09-29 2020-12-12 2021-02-04 a a g <NA>
3 3 g 2020-12-04 <NA> <NA> <NA> <NA>
4 4 r 2020-12-09 2020-12-12 2020-12-14 2021-01-05 c c g 2020-12-12 c
5 5 t 2021-03-22 2021-03-25 2021-03-29 <NA> b t 2021-03-29 t
6 6 b 2021-04-04 2021-04-07 2021-04-09 <NA> b t 2021-04-07 b
我从上一个问题中遵循了一些技术,例如 using ,c_across
但我似乎无法让 R 带回单个输出。这可能吗?我是否需要再次长期结构化数据?across
rowwise