2

我有数据框df,我想df根据分类中的数字序列进行子集化。

 x  <- c(1,2,3,4,5,7,9,11,13)
 x2 <- x+77 
 df <- data.frame(x=c(x,x2),y= c(rep("A",9),rep("B",9)))

 df
    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
6   7 A
7   9 A
8  11 A
9  13 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B
15 84 B
16 86 B
17 88 B
18 90 B

我只想要x增加 1 的行,而不想要增加 2 的行x:例如

    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B

我想我必须在元素之间做一些减法,并检查差异是否存在>1,并将其与 a 结合起来,ddply但这似乎很麻烦。sequence我缺少某种功能吗?

4

2 回答 2

3

使用差异

df[which(c(1,diff(df$x))==1),]
于 2012-11-30T12:38:06.610 回答
2

您的示例似乎表现良好,@agstudy 的回答可以很好地处理。但是,如果您的数据有朝一日会发生变化...

myfun <- function(d, whichDiff = 1) {
  # d is the data.frame you'd like to subset, containing the variable 'x'
  # whichDiff is the difference between values of x you're looking for

  theWh <- which(!as.logical(diff(d$x) - whichDiff))
  # Take the diff of x, subtract whichDiff to get the desired values equal to 0
  # Coerce this to a logical vector and take the inverse (!)
  # which() gets the indexes that are TRUE.

  # allWh <- sapply(theWh, "+", 1)
  # Since the desired rows may be disjoint, use sapply to get each index + 1
  # Seriously? sapply to add 1 to a numeric vector? Not even on a Friday.
  allWh <- theWh + 1

  return(d[sort(unique(c(theWh, allWh))), ])
}

> library(plyr)
> 
> ddply(df, .(y), myfun)
    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
6  78 B
7  79 B
8  80 B
9  81 B
10 82 B
于 2012-11-30T14:01:08.210 回答