我试图简单地选择leaded
变量的前两个或三个值。
想象一下我的数据看起来像这样
id variable leadvar
1 a 0 0
2 a 1 0
3 a 1 0
4 b 0 0
5 b 0 0
6 b 1 0
7 c 0 0
8 c 0 0
9 c 0 0
10 d 1 0
11 d 1 0
12 d 1 0
我想要的是首先lead
,variable
条件是lead minus 1 = 0
(对于每个id
)(这意味着1
如果 a1
前面是,则前导变量应该采用0
),例如:
id variable leadvar
1 a 0 1
2 a 1 0
3 a 1 0
4 b 0 0
5 b 0 1
6 b 1 0
7 c 0 0
8 c 0 0
9 c 0 0
10 d 1 0
11 d 1 0
12 d 1 0
然后选择前导之后的第一行(以及前导变量本身),如下所示:
id variable leadvar
a 0 1
a 1 0
b 0 1
b 1 0
我在最后一步挣扎。我希望能够自由选择领先后的行数。我怎样才能做到这一点 ?
我的代码是:
为了计算lead
library(dplyr)
dt = dt %>% group_by(id) %>% mutate(leadvar = ifelse(variable == 0 & lead(variable == 1, default = 0), 1, 0) )
我尝试在引导后选择 2 行,但它不起作用
dt %>% group_by(id) %>% mutate(V4 = variable + leadvar) %>% mutate(m = 1:n()) %>% filter(m < 3)
数据
dt = structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"),
variable = c(0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1), lead = c(1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0)), class = "data.frame", .Names = c("id", "variable", "lead"), row.names = c(NA, -12L))