4

I have two vectors:

x<-c(0,1,0,2,3,0,1,1,0,2)
y<-c("00:01:00","00:02:00","00:03:00","00:04:00","00:05:00",
     "00:06:00","00:07:00","00:08:00","00:09:00","00:10:00")

I need to choose only those in y, where values of x is not interrupted by 0. As a result, I'd like to get a dataframe like this

y        x
00:04:00 2
00:05:00 3
00:07:00 1
00:08:00 1

We built a script like this, but with a big dataset it takes time. Is there a more elegant solution? And I wonder, why df<-rbind(bbb,df) returns inverted df?

aaa<-data.frame(y,x)
df<-NULL
for (i in 1:length(aaa$x)){
  bbb<-ifelse((aaa$x[i]*aaa$x[i+1])!=0, 
              aaa$x[i], 
              ifelse((aaa$x[i]*aaa$x[i-1])!=0, 
                     aaa$x[i], 
                     NA))
  df<-rbind(bbb,df)
}
df<-data.frame(rev(df))
aaa$x<-df$rev.df.
bbb<-na.omit(aaa)
bbb

I'm a newbie in R, so please, as much detail as you can :) Thank you!

4

1 回答 1

2
aaa <- data.frame(y,x)
rles <- rle(aaa$x == 0)
bbb <- aaa[rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths),]

这使

> bbb
         y x
4 00:04:00 2
5 00:05:00 3
7 00:07:00 1
8 00:08:00 1

您遇到的子问题:df<-rbind(bbb,df)返回dfreversed 因为您bbb在其余(现有)行之前添加新行();颠倒参数的顺序,你不需要 reverse df

现在分解答案,因为它涉及很多部分。aaa首先,重新表述您的标准,您希望至少有 2 行没有 0 的延伸。所以第一个标准是找到 0

> aaa$x == 0
 [1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

然后你想弄清楚这些伸展的长度;rle做这个。

> rle(aaa$x == 0)
Run Length Encoding
  lengths: int [1:8] 1 1 1 2 1 2 1 1
  values : logi [1:8] TRUE FALSE TRUE FALSE TRUE FALSE ...

这意味着有 1 TRUE,然后是 1 ,然后是FALSE1 TRUE,然后是 2 FALSEs,等等。这个结果被分配给rles。您想要的部分是值所在的位置FALSE(不是 0),并且该运行的长度为 2 或更多。

> rles$values == FALSE & rles$lengths >= 2
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE

这需要扩展回 的长度aaa,并且rep会这样做,使用rles$lengths来复制适当的条目。

> rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths)
 [1] FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE

这给出了一个适合索引的逻辑向量aaa

> aaa[rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths),]
         y x
4 00:04:00 2
5 00:05:00 3
7 00:07:00 1
8 00:08:00 1
于 2012-10-08T22:51:14.860 回答