0

I want to delete data with gaps between the max and min time period corresponding to an individual id. Each Id can start and end in any time period, that is fine. I just want to grab ids that do not have missing time within the max and min time.

library(data.table)
set.seed(5)
data<-data.table(y=rnorm(100))
data[sample(1:100, 40),]<-NA
id = rep(1:10, each = 10)
time = seq(1,10)
data2<-data.frame(id,time)
data2$row<-1:nrow(data2)
data2a<-subset(data2,row<55|row>61 )
data3<-data2a[-sample(nrow(data2a), 5),]
data.table(data3)
count(data3$id)

Here is a good example. Group 1 should be deleted, but not 6 for example.

4

2 回答 2

2

您要过滤的条件是没有大于 1diff(time)的间隙。为您提供间隙,因此请all(diff(time) == 1)检查条件。

因此,您可以这样做:

library(dplyr)
data3 %>%
    group_by(id) %>%
    filter(all(diff(time) == 1))

在 data.table 中,一种解决方案(做同样的事情)是:

setDT(data3)[, .SD[all(diff(time) == 1)], id]
于 2015-08-13T02:52:08.987 回答
0

使用dplyr

library(dplyr)
data3 %>% group_by(id) %>%
          filter(identical(time, seq(first(time), last(time))))
于 2015-08-13T02:51:54.130 回答