假设我有两个数据框。
第一个包括“名称”为“ID”发出“Rec”的“Date”和“Rec”失效的“Stop.Date”。
df(仅一部分)
structure(list(Date = structure(c(13236, 13363, 14074, 13199,
14554), class = "Date"), ID = c("AU0000XINAA9", "AU0000XINAA9",
"AU0000XINAC5", "AU0000XINAI2", "AU0000XINAJ0"), Name = c("N+1 BREWIN",
"N+1 BREWIN", "ARBUTHNOT SECURITIES LTD.", "INVESTEC BANK (UK) PLC",
"AWRAQ INVESTMENTS"), Rec = c(1, 2, 2, 2, 1), Stop.Date = structure(c(13363,
13509, 14937, 13230, 16702), class = "Date")), .Names = c("Date",
"ID", "Name", "Rec", "Stop.Date"), class = c("data.table", "data.frame"
), row.names = c(NA, -5L))
第二个数据帧只包含一个时间序列:假设在这种情况下从 2006 年 3 月 29 日到 2006 年底。
df2
Date1
1: 2006-02-20
2: 2006-02-21
3: 2006-02-22
4: 2006-02-23
5: 2006-02-24
---
311: 2006-12-27
312: 2006-12-28
313: 2006-12-29
314: 2006-12-30
315: 2006-12-31
现在,如果 df2 中的“Date1”变量在时间范围内(直到 Stop.Date 的日期),我希望我的代码将所有由 ID 和名称组合的“Rec”相加
我发现这篇文章R - 如果日期在范围内,则求和,它似乎非常接近我的问题,但解决方案不考虑任何组。
我想提出一个data.frame,其中对于df2中的每个日期,都会显示每个“ID”的“REC”总和。
预期输出,例如
Date1 ID SumRec
1 2006-02-20 AU0000XINAI2 2
2 2006-02-21 AU0000XINAI2 2
...
4 2006-03-29 AU0000XINAA9 1
5 2006-03-30 AU0000XINAA9 1
6 2006-08-03 AU0000XINAA9 2 # since Date1 2006-08-03 is at the end
of range in df (row#1)-> it falls
within range in df (row#2)
...
请记住,这只是数据的一小部分。通常,来自不同“名称”的每个“ID”存在更多的 Recs。(那么 sum 函数就有意义了)
非常感谢您提前提供的帮助。
更新后的版本
新数据框:
df
structure(list(Date = structure(c(9905, 10381, 10381, 10954,
10584, 10632, 10778, 10520, 10631, 10905), class = "Date"), ID = c("BMG4593F1389",
"BMG4593F1389", "BMG4593F1389", "BMG4593F1389", "BMG4593F1389",
"BMG4593F1389", "BMG4593F1389", "BMG526551004", "BMG526551004",
"BMG526551004"), Name = c("ING FM", "Permission Denied 128064",
"Permission Denied 2880", "Permission Denied 2880", "Permission Denied 32",
"Permission Denied 888", "Permission Denied 888", "Permission Denied 2880",
"Permission Denied 2880", "Permission Denied 2880"), Rec = c(2,
3, 2, 2, 3, 3, 3, 1, 3, 3), Stop.Date = structure(c(12095, 11232,
10954, 11180, 11345, 10764, 11667, 10631, 10905, 11087), class = "Date")), .Names = c("Date",
"ID", "Name", "Rec", "Stop.Date"), class = c("data.table", "data.frame"
), row.names = c(NA, -10L))
df2
structure(list(Date1 = structure(c(10954, 10955, 10956, 10957,
10958, 10959), class = "Date")), .Names = "Date1", row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
如果我现在执行以下代码:
> df=df[,interval := interval(df$Date, df$Stop.Date)]
>
> df1 <- do.call(rbind, lapply(df2$Date1, function(x){ index <- x
> %within% df$interval; list(ID = ifelse(any(index), df$ID[index],
> NA), Rec = ifelse(any(index), df$Rec[index], NA),
> Name = ifelse(any(index), df$Name[index], NA),interval = ifelse(any(index),df$interval[index],NA))}))
>
> df3 <- cbind(df2, df1)
我得出以下结果:
Date1 ID Rec Name interval
1: 1999-12-29 BMG4593F1389 2 ING FM 189216000
2: 1999-12-30 BMG4593F1389 2 ING FM 189216000
3: 1999-12-31 BMG4593F1389 2 ING FM 189216000
4: 2000-01-01 BMG4593F1389 2 ING FM 189216000
5: 2000-01-02 BMG4593F1389 2 ING FM 189216000
6: 2000-01-03 BMG4593F1389 2 ING FM 189216000
但是由于例如df$ID“BMG4593F1389”的df2$Date1(“1999-12-29”)属于df中另外6个条目的日期范围(对于不同的df$Names)对于这个特定的df$date1,它应该是:
日期 1999-12-29 的预期结果(为简单起见,此处忽略 df3$interval 变量)
Date1 ID Rec Name
1: 1999-12-29 BMG4593F1389 2 ING FM
2: 1999-12-29 BMG4593F1389 3 Permission Denied 128064
3: 1999-12-29 BMG4593F1389 2 Permission Denied 2880
4: 1999-12-29 BMG4593F1389 3 Permission Denied 32
5: 1999-12-29 BMG4593F1389 3 Permission Denied 888
6: 1999-12-29 BMG5265510042 3 Permission Denied 2880
7: 1999-12-30 BMG4593F1389 2 ING FM
... etc
所以最后我需要复制 df$Date1 中的日期,如果有多个名称为特定的 df$ID 发出 Rec,该特定 df$ID 落在相应的日期范围内。
有人可以帮我吗?