0

呈现如下矢量,

vec01 <- c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 1, 2, 1,
           2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2,
           1, 2, 2, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1,
           2, 1, 2, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3,
           1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1)

问题1:如何删除下面突出显示的异常:

vec01 <- c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 1, 2, 1,
           2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, *2*, *2*,
           1, 2, *2*, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1,
           2, 1, 2, 3, 4, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3,
           1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1)

异常的定义:该元素需要属于系列 1,2,.... 上面以粗体标记的元素

问题2:如何在去除异常后识别序列组,每个序列属于一个组,即输出如下

result <- structure(list(vec = c(1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
                                 4L, 5L, 6L, 7L, 8L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
                                 1L, 1L, 1L, 2L, 1L, 2L),
                         group = c(1L, 1L, 2L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 8L,
                                   8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
                                   11L, 12L, 13L, 14L, 15L, 15L, 16L, 16L)),
                         .Names = c("vec", "group"),
                         row.names = c(NA, 30L), class = "data.frame")
4

3 回答 3

7

这是问题 2(如果最后删除所有 TRUE 则问题 1)

library(data.table)  #load data.table because syntax is nice (matter of pers taste)
DT = data.table(vec01)
DT[,counter:=ifelse(vec01==1,1,0)]  #identify each sequence starting with one
DT[,counter:=cumsum(counter)]  #trick to give a diff ID to each seq so we can use by
DT[,flag:=is.unsorted(vec01),by=counter]  #check sorting for each sequence

编辑:替换is.unsortedf(vec01)f = function(x){!(x==Reduce(max,x,accumulate=T))}

于 2013-03-17T10:55:45.763 回答
2

要清理序列(问题 1):

m <- vec01[1]==1
for (i in seq(2,length(vec01))) 
    m[i] <- vec01[i]==1 || vec01[i]==vec01[i-1]+1 && m[i-1]
vec01 <- vec01[m]

现在制作您想要的结构(感谢@statquant的cumsum()想法):

data.frame(vec=vec01, group=cumsum(c(1,diff(vec01)!=1)))
于 2013-03-17T10:42:36.887 回答
2

有趣的问题,这是另一个解决方案。它查看值在哪里递增并构建相应的理想(无异常)序列vec02。那么这只是一个比较vec01和的问题vec02

is.incr <- c(FALSE, diff(vec01) == 1)
vec02   <- rep(1, length(vec01)) + sequence(rle(is.incr)$lengths) * is.incr
vec     <- vec01[vec01 == vec02]
result  <- data.frame(vec = vec, group = cumsum(vec == 1))
于 2013-03-18T04:08:36.087 回答