我以这种方式有一个 0 和 1 的序列:
xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0,
0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1)
我想选择 0 和第一个 1。
结果应该是:
ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1)
最快的方法是什么?在 R 中
用于rle()
提取游程长度和值,做一些小手术,然后使用inverse.rle()
.
rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
这是一种方法:
idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[-idx[pos]] # following Frank's suggestion
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
没有 rle:
xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
#[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
由于 OP 提到了速度,这里有一个基准:
josh = function(xx) {
rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
}
arun = function(xx) {
idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[setdiff(seq_along(xx), idx[pos])]
}
eddi = function(xx) {
xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
}
simon = function(xx) {
# The body of the function is supplied in @SimonO101's answer
first1(xx)
}
set.seed(1)
N = 1e6
xx = sample(c(0,1), N, T)
library(microbenchmark)
bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25)
print( bm , digits = 2 , order = "median" )
#Unit: milliseconds
# expr min lq median uq max neval
# simon(xx) 20 21 23 26 72 25
# eddi(xx) 97 102 104 118 149 25
# arun(xx) 205 245 253 258 332 25
# josh(xx) 228 268 275 287 365 25
这是一个快速的Rcpp
解决方案。应该很快(但我不知道它会如何与这里的其他人相提并论)......
Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){
std::vector<int> out;
for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){
if( *it == 1 && *(it-1) != 1 || *it == 0 )
out.push_back(*it);
}
return out;
}')
first1(xx)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
即使我是 的坚定支持者rle
,因为现在是星期五,这里有另一种方法。我这样做是为了好玩,所以 YMMV。
yy<-paste(xx,collapse='')
zz<-gsub('[1]{1,}','1',yy) #I probably screwed up the regex here
aa<- as.numeric(strsplit(zz,'')[[1]])