1

如何像这样在df中唯一(rle唯一)元组

structure(c("M01", "M01", "M01", "M01", "M01", "M02", "M02", 
"M02", "M02", "M03", "M03", "F04", "F04", "F02", "F02", "F04", 
"F10", "F10", NA, "F10", "F01", "F01"), .Dim = c(11L, 2L), .Dimnames = list(
    NULL, c("a", "b")))

> sample
      a     b    
 [1,] "M01" "F04"
 [2,] "M01" "F04"
 [3,] "M01" "F02"
 [4,] "M01" "F02"
 [5,] "M01" "F04"
 [6,] "M02" "F10"
 [7,] "M02" "F10"
 [8,] "M02" NA   
 [9,] "M02" "F10"
[10,] "M03" "F01"
[11,] "M03" "F01"

得到这个:

structure(c("M01", "M01", "M01", "M02", "M02", "M03", "F04", 
"F02", "F04", "F10", "F10", "F01"), .Dim = c(6L, 2L), .Dimnames = list(
    NULL, c("d", "c")))
> output
     d     c    
[1,] "M01" "F04"
[2,] "M01" "F02"
[3,] "M01" "F04"
[4,] "M02" "F10"
[5,] "M02" "F10"
[6,] "M03" "F01"

所以这个想法是得到一个带有元组的df,但基于行并且仅基于前一个元素是唯一的,所以: unique(sample) 不提供我需要的东西。rle 是否可以在此 df 上运行以考虑元组并将 df 作为输出?有更好的方法吗?

rle(sample[,2]$values)

给出了想要的结果,但显然我丢失了第 1 列的有价值信息。

4

1 回答 1

6

这个怎么样?

# dd is the matrix structure you posted in the question
dd <- as.data.frame(dd)                     ## convert to data.frame
dd[] <- lapply(dd, as.character)            ## change columns to character
na.omit(dd[cumsum(rle(dd$b)$lengths), ])    ## get indices by cumsum'ing rle-lengths 
                                            ## wrap with na.omit to remove NA rows
#      a   b
# 2  M01 F04
# 4  M01 F02
# 5  M01 F04
# 7  M02 F10
# 9  M02 F10
# 11 M03 F01
于 2013-03-19T15:36:34.357 回答