0

我有一个数据框如下所示:

data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"), 
                 diff = c(1,1,1,81,1,1,1,2,1,1,1)
)

最终,我想得到这样的东西:

data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"), 
                  diff = c(1,1,1,81,1,1,1,2,1,1,1),
                  block = c(1,1,1,2,2,2,2,3,3,1,1)
)

所以基本上,我想做的是每次diff列中的值大于 1 时,都会创建一个新块。我想按组执行此操作,即userID.

现在我正在考虑使用LOCF或编写一个循环,但它似乎不起作用。有什么建议吗?谢谢!

4

2 回答 2

1

一个选项是按“用户 ID”分组,然后取逻辑表达式 ( diff > 1)的累积和

library(dplyr)
data %>% 
   group_by(userID) %>% 
   mutate(block = 1 + cumsum(diff > 1))
# A tibble: 11 x 3
# Groups:   userID [2]
#   userID  diff block
#   <fct>  <dbl> <dbl>
# 1 a          1     1
# 2 a          1     1
# 3 a          1     1
3 4 a         81     2
# 5 a          1     2
3 6 a          1     2
# 7 a          1     2
# 8 a          2     3
# 9 a          1     3
#10 b          1     1
#11 b          1     1
于 2019-07-23T15:47:37.380 回答
1

基础中,您可以使用ave如下:

data$block <- ave(data$diff>1, data$userID, FUN=cumsum)+1
#   userID diff block
#1       a    1     1
#2       a    1     1
#3       a    1     1
#4       a   81     2
#5       a    1     2
#6       a    1     2
#7       a    1     2
#8       a    2     3
#9       a    1     3
#10      b    1     1
#11      b    1     1
于 2019-07-23T15:53:12.200 回答