dplyr
实现它的一种可能性是:
df %>%
mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = (n() >= 4) * episode) %>%
ungroup() %>%
select(-rleid) %>%
left_join(df %>%
mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = (n() >= 4) * episode) %>%
ungroup() %>%
select(-rleid), by = c("Patient" = "Patient",
"Minute" = "Minute",
"temperature" = "temperature")) %>%
mutate(episode = pmax(episode.x, episode.y)) %>%
select(-episode.x, -episode.y)
Patient Minute temperature episode
<int> <dbl> <dbl> <int>
1 1 0 35.6 0
2 1 1 35.6 0
3 1 2 35.7 1
4 1 3 35.7 1
5 1 4 35.7 1
6 1 5 35.7 1
7 1 6 35.7 0
8 1 7 35.7 0
9 1 8 35.7 0
10 1 9 35.7 0
11 1 10 35.7 0
12 1 11 35.7 0
请注意,我将天数从 15 减少到 4(您可以通过修改 中的数字来更改它n() >= 4
),因为您的数据没有包含足够的行来说明那么多天。
它的作用是,它首先比较一行是否具有比前一行更高/更低的“温度”值(对于这两个条件它分别进行)。其次,它围绕这个比较创建一个游程类型组 ID。第三,如果行的条件满足n
(在我的代码中为 4),它会在名为“episode”的变量中分配 1。最后,它结合了第一步的比较结果。
或者,如果您还想区分剧集:
df %>%
mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = (n() >= 2) * episode) %>%
ungroup() %>%
select(-rleid) %>%
left_join(df %>%
mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = ((n() >= 2) * episode + 1) * episode) %>%
ungroup() %>%
select(-rleid), by = c("Patient" = "Patient",
"Minute" = "Minute",
"temperature" = "temperature")) %>%
mutate(episode = pmax(episode.x, episode.y)) %>%
select(-episode.x, -episode.y)
Patient Minute temperature episode
<int> <dbl> <dbl> <dbl>
1 1 0 35.6 0
2 1 1 35.6 0
3 1 2 35.7 1
4 1 3 35.7 1
5 1 4 35.7 1
6 1 5 35.7 1
7 1 6 35.7 2
8 1 7 35.7 2
9 1 8 35.7 2
10 1 9 35.7 1
11 1 10 35.7 1
12 1 11 35.7 1
在这里,使用 2 个窗口,“episode” == 1 表示增加,“episode” == 2 表示减少。
我想你想按“病人”分组,所以你可以这样做:
df %>%
group_by(Patient) %>%
mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
group_by(Patient, rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = (n() >= 2) * episode) %>%
ungroup() %>%
select(-rleid) %>%
left_join(df %>%
group_by(Patient) %>%
mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
group_by(Patient, rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
mutate(episode = ((n() >= 2) * episode + 1) * episode) %>%
ungroup() %>%
select(-rleid), by = c("Patient" = "Patient",
"Minute" = "Minute",
"temperature" = "temperature")) %>%
mutate(episode = pmax(episode.x, episode.y)) %>%
select(-episode.x, -episode.y)