1

假设我有这两个data.tables:

  A <- data.table(date = c("2003-05-24", "2003-06-05", "2003-06-24", "2003-06-25", "2003-06-27"),
                  "id" = c(1,2,1,1,2))

  B <- data.table(idd = c(1,1,1,1,1),
                  datee =  c("2003-05-25", "2003-06-06", "2003-06-25", "2003-06-26", "2003-06-28"),
                  value = c(1,2,3,4,5))
> A
         date id
1: 2003-05-24  1
2: 2003-06-05  2
3: 2003-06-24  1
4: 2003-06-25  1
5: 2003-06-27  2

> B
   idd      datee value
1:   1 2003-05-25     1
2:   1 2003-06-06     2
3:   1 2003-06-25     3
4:   1 2003-06-26     4
5:   1 2003-06-28     5

对于 A 中的每个 id,我想加入 B 中最接近(基于日期)的先前值。这给出了所需的结果:

A[B, value := i.value, on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> A
         date id value
1: 2003-05-24  1    NA
2: 2003-06-05  2    NA
3: 2003-06-24  1     2
4: 2003-06-25  1     3
5: 2003-06-27  2    NA

问题是,我在 B 中不仅仅是一列,而是几百列。我真的不想输入所有这些列名,例如 valueXXX = i.valueXXX 等,特别是因为 B 中列的数量和名称可能会改变。

所以我尝试像这样进行滚动连接:

C <- A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> C
         date id value
1: 2003-05-25  1     1
2: 2003-06-06  1     2
3: 2003-06-25  1     3
4: 2003-06-26  1     4
5: 2003-06-28  1     5

如您所见,结果根本不是我想要的。有人可以向我解释一下,为什么 data.table 会这样吗?另外,在不对所有这些列名进行硬编码的情况下实现我想要的结果的正确方法是什么?

编辑:弗兰克提供的链接确实解决了我的问题。基本上定义要添加的变量的向量,然后将“:=”与mget一起使用:

vars <- c("value")  # in my case hundreds of variables, but in this toy example just one

A[B, (vars) := mget(paste0("i.", vars)), on = c("id" = "idd", "date" = "datee"), roll=-Inf]
4

0 回答 0