2

我有一个编辑数据表:

library(data.table)

edits <- data.table(proposal=c('A','A','A'),
           editField=c('probability','probability','probability'),
           startDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2018-10-10 15:47:00')),
           endDate=as.POSIXct(c('2019-09-06 12:12:00','2018-10-10 15:47:00','9999-12-31 05:00:00')),
           value=c(.1,.3,.1))

   proposal   editField           startDate             endDate value
1:        A probability 2017-04-14 00:00:00 2019-09-06 12:12:00   0.1
2:        A probability 2019-09-06 12:12:00 2018-10-10 15:47:00   0.3
3:        A probability 2018-10-10 15:47:00 9999-12-31 05:00:00   0.1

我想加入事件数据表:

events <-     data.table(proposal='A',
                  editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
                  editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))

    proposal            editDate   editField
 1:        A 2017-04-14 00:00:00     Created
 2:        A 2019-09-06 12:12:00       stage
 3:        A 2019-09-06 12:12:00 probability
 4:        A 2019-09-06 12:12:00    estOrder
 5:        A 2018-07-04 15:33:59    estOrder
 6:        A 2018-07-27 08:01:00    estOrder
 7:        A 2018-10-10 15:47:00       stage
 8:        A 2018-10-10 15:47:00 probability
 9:        A 2018-10-10 15:47:00    estOrder
10:        A 2018-11-26 11:10:00    estOrder
11:        A 2019-02-05 13:06:59    estOrder

要获得如下所示的输出,其中值指定编辑发生时的概率值:

desired.join <- cbind(events, value=c(.1,.3,.3,.3,.3,.3,.3,.1,.1,.1,.1))
    proposal            editDate   editField value
 1:        A 2017-04-14 00:00:00     Created   0.1
 2:        A 2019-09-06 12:12:00       stage   0.3
 3:        A 2019-09-06 12:12:00 probability   0.3
 4:        A 2019-09-06 12:12:00    estOrder   0.3
 5:        A 2018-07-04 15:33:59    estOrder   0.3
 6:        A 2018-07-27 08:01:00    estOrder   0.3
 7:        A 2018-10-10 15:47:00       stage   0.3
 8:        A 2018-10-10 15:47:00 probability   0.1
 9:        A 2018-10-10 15:47:00    estOrder   0.1
10:        A 2018-11-26 11:10:00    estOrder   0.1
11:        A 2019-02-05 13:06:59    estOrder   0.1

到目前为止,这是我尝试加入两者的方法:

edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]

但是,当我尝试这样做时,我收到一条错误消息,显示“ vecseq 中的错误(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 16 rows; more than 14 = nrow( x)+nrow(i). 检查 i 中的重复键值,每个键值都加入 x 中的同一组。如果没问题,请尝试 by=.EACHI 为每个组运行 j 以避免大分配. 如果您确定要继续,请使用 allow.cartesian=TRUE 重新运行。否则,请在 FAQ、Wiki、Stack Overflow 和 data.table 问题跟踪器中搜索此错误消息以获取建议。 "

4

1 回答 1

2

看起来您正在尝试连接编辑和事件,以便编辑数据表中的概率值与事件数据表中的正确观察相关联。

看起来错误正在发生,因为用于创建编辑数据表的时间间隔不是互斥的。当我将时间间隔修改为我认为您想要的时间间隔时,您的代码会给出您正在寻找的结果。

library(data.table)

edits <- data.table(proposal=c('A','A','A'),
    editField=c('probability','probability','probability'),
    startDate=as.POSIXct(c('2017-04-14 00:00:00','2018-10-10 15:47:00','2019-09-06 12:12:00')),
    endDate=as.POSIXct(c('2018-10-10 15:47:00','2019-09-06 12:12:00','9999-12-31 05:00:00')),
    value=c(.1,.3,.1))

events <- data.table(proposal='A',
    editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
    editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))

edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]

或者您可以在不链接的情况下进行连接

  edits[events, on=.(proposal, startDate<=editDate, endDate>editDate)]

或者您可以按照 Jonny Phelps 的建议进行操作并使用 foverlaps,但这也需要编辑数据表中的互斥时间间隔

events[,startDate:= editDate]

setkey(events, startDate, editDate)

setkey(edits, startDate, endDate)

foverlaps(events, edits, type="any", mult="first") 
于 2019-12-01T00:20:23.453 回答