r - 从R中的许多0和少数1的序列中只选择0和第一个1？

Question

我以这种方式有一个 0 和 1 的序列：

xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
                    0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1)

我想选择 0 和第一个 1。

结果应该是：

ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1)

最快的方法是什么？在 R 中

score 16 · Accepted Answer

用于rle()提取游程长度和值，做一些小手术，然后使用inverse.rle().

rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
#  [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

score 8 · Accepted Answer

这是一种方法：

idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[-idx[pos]] # following Frank's suggestion
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

score 7 · Accepted Answer

没有 rle：

xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
#[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

由于 OP 提到了速度，这里有一个基准：

josh = function(xx) {
  rr <- rle(xx)
  rr$lengths[rr$values==1] <- 1
  inverse.rle(rr)
}

arun = function(xx) {
  idx <- which(xx == 1)
  pos <- which(diff(c(xx[1], idx)) == 1)
  xx[setdiff(seq_along(xx), idx[pos])]
}

eddi = function(xx) {
  xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
}

simon = function(xx) {
    #  The body of the function is supplied in @SimonO101's answer
    first1(xx)
}

set.seed(1)
N = 1e6    
xx = sample(c(0,1), N, T)

library(microbenchmark)
bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25)
print( bm , digits = 2 , order = "median" )
#Unit: milliseconds
#      expr min  lq median  uq max neval
# simon(xx)  20  21     23  26  72    25
#  eddi(xx)  97 102    104 118 149    25
#  arun(xx) 205 245    253 258 332    25
#  josh(xx) 228 268    275 287 365    25

score 3 · Accepted Answer

这是一个快速的Rcpp解决方案。应该很快（但我不知道它会如何与这里的其他人相提并论）......

Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){
    std::vector<int> out;
    for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){
        if( *it == 1 && *(it-1) != 1 || *it == 0  )
          out.push_back(*it);
    }
    return out;
}')

first1(xx)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

score 2 · Accepted Answer

即使我是的坚定支持者rle，因为现在是星期五，这里有另一种方法。我这样做是为了好玩，所以 YMMV。

yy<-paste(xx,collapse='')
zz<-gsub('[1]{1,}','1',yy)  #I probably screwed up the regex here
aa<- as.numeric(strsplit(zz,'')[[1]])

r - 从R中的许多0和少数1的序列中只选择0和第一个1？

5 回答 5

Related

Reference