1

我有一个如下所示的数据表:

DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))

   day Consumption id
1:   1           5  1
2:   2           9  2
3:   3          10  3
4:   4           2  1
5:   5          NA  1
6:   6          NA  2
7:   7          NA  2
8:   8          NA  1

我想创建两列来显示观察前的最后一个非 Na 消耗值,以及使用 id 组的这些观察之间的天差。到目前为止,我试过这个:

DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  1 NA
8:   8          NA  1  3 NA 

但是,我希望 n=1 的滞后消耗值来自具有非 NA 消耗值的行。例如,在第 7 行和第 7 列“yj”中,yj 值是 NA,因为它来自具有 NA 消耗的第 6 行。我希望它来自第二行。因此,我希望最终得到这个数据表:

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  5  9
8:   8          NA  1  4  2

注:之所以专门使用shift函数的参数n,是因为下一步我还需要倒数第二个非Na消耗值。

谢谢你

4

1 回答 1

0

这是一个在解决方案:

library(data.table)
library(zoo)

DT[, `:=`(day_shift = shift(day),
          yj = shift(Consumption)),
   by = id]

#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]

#fill the DT with the last non-NA value
DT[,
   `:=`(day_shift = na.locf(day_shift, na.rm = F),
          yj = zoo::na.locf(yj, na.rm = F)),
   by = id]

# finally calculate j
DT[, j:= day - day_shift]

# you can clean up the ordering or remove columns later
DT

   day Consumption id day_shift yj  j
1:   1           5  1        NA NA NA
2:   2           9  2        NA NA NA
3:   3          10  3        NA NA NA
4:   4           2  1         1  5  3
5:   5          NA  1         4  2  1
6:   6          NA  2         2  9  4
7:   7          NA  2         2  9  5
8:   8          NA  1         4  2  4
于 2019-09-14T11:37:55.463 回答