我可以通过多次循环我的数据集来做到这一点,但我认为必须有一种更有效的方法来通过 data.table 做到这一点。这是数据集的样子:
CaseID Won OwnerID Time_period Finished
1 yes A 1 no
1 yes A 3 no
1 yes A 5 yes
2 no A 4 no
2 no A 6 yes
3 yes A 2 yes
4 no A 3 yes
5 15 B 2 no
对于每一行,按所有者,我想生成在该时间段之前完成的案件数量的平均值。
CaseID Won OwnerID Time_period Finished AvgWonByOwner
1 yes A 1 no NA
1 yes A 3 no 1
1 yes A 5 yes .5
2 no A 4 no .5
2 no A 6 yes 2/3
3 yes A 2 yes NA
4 no A 3 yes 1
5 15 B 2 no NA
仔细看这个,似乎复杂得可笑。我认为您可以通过某种滚动合并来做到这一点,但我不知道如何设置一个条件,即仅在行日期之前从 Won 计算平均值并且它必须具有相同的 ownerID .
编辑 1:最后一列中数字的解释
AvgWonByOwner Explanation
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
.5 t = 5, case 3 finished, won; case 4 finished lost; average = .5
.5 t = 4, case 3 finished, won; case 4 finished lost; average = .5
2/3 t = 6, case 3 finished, won, case 4 finished lost, case 1 finished won, average: 2/3
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
NA t = 1, No cases finished yet, this could be 0 too