0

我有一个数据集,我想在其中定义“剧集”。如果温度升高或降低至少 15 分钟,则定义为发作。有没有办法在不手动进行的情况下构建它?

这是我的数据结构:

Patient  Minute temperature
1 0,00  35,65
1 1,00  35,65
1 2,00  35,66
1 3,00  35,67
1 4,00  35,70
1 5,00  35,72
1 6,00  35,71
1 7,00  35,68
1 8,00  35,66
1 9,00  35,67
1 10,00 35,69
1 11,00 35,72

提前致谢。

4

1 回答 1

0

dplyr实现它的一种可能性是:

df %>%
 mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
 group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
 mutate(episode = (n() >= 4) * episode) %>%
 ungroup() %>%
 select(-rleid) %>%
 left_join(df %>%
            mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
            group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
            mutate(episode = (n() >= 4) * episode) %>%
            ungroup() %>%
            select(-rleid), by = c("Patient" = "Patient",
                                   "Minute" = "Minute",
                                   "temperature" = "temperature")) %>%
 mutate(episode = pmax(episode.x, episode.y)) %>%
 select(-episode.x, -episode.y)

   Patient Minute temperature episode
     <int>  <dbl>       <dbl>   <int>
 1       1      0        35.6       0
 2       1      1        35.6       0
 3       1      2        35.7       1
 4       1      3        35.7       1
 5       1      4        35.7       1
 6       1      5        35.7       1
 7       1      6        35.7       0
 8       1      7        35.7       0
 9       1      8        35.7       0
10       1      9        35.7       0
11       1     10        35.7       0
12       1     11        35.7       0

请注意,我将天数从 15 减少到 4(您可以通过修改 中的数字来更改它n() >= 4),因为您的数据没有包含足够的行来说明那么多天。

它的作用是,它首先比较一行是否具有比前一行更高/更低的“温度”值(对于这两个条件它分别进行)。其次,它围绕这个比较创建一个游程类型组 ID。第三,如果行的条件满足n(在我的代码中为 4),它会在名为“episode”的变量中分配 1。最后,它结合了第一步的比较结果。

或者,如果您还想区分剧集:

df %>%
 mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
 group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
 mutate(episode = (n() >= 2) * episode) %>%
 ungroup() %>%
 select(-rleid) %>%
 left_join(df %>%
            mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
            group_by(rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
            mutate(episode = ((n() >= 2) * episode + 1) * episode) %>%
            ungroup() %>%
            select(-rleid), by = c("Patient" = "Patient",
                                   "Minute" = "Minute",
                                   "temperature" = "temperature")) %>%
 mutate(episode = pmax(episode.x, episode.y)) %>%
 select(-episode.x, -episode.y)

   Patient Minute temperature episode
     <int>  <dbl>       <dbl>   <dbl>
 1       1      0        35.6       0
 2       1      1        35.6       0
 3       1      2        35.7       1
 4       1      3        35.7       1
 5       1      4        35.7       1
 6       1      5        35.7       1
 7       1      6        35.7       2
 8       1      7        35.7       2
 9       1      8        35.7       2
10       1      9        35.7       1
11       1     10        35.7       1
12       1     11        35.7       1

在这里,使用 2 个窗口,“episode” == 1 表示增加,“episode” == 2 表示减少。

我想你想按“病人”分组,所以你可以这样做:

df %>%
 group_by(Patient) %>%
 mutate(episode = temperature > lag(temperature, default = first(temperature))) %>%
 group_by(Patient, rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
 mutate(episode = (n() >= 2) * episode) %>%
 ungroup() %>%
 select(-rleid) %>%
 left_join(df %>%
            group_by(Patient) %>%
            mutate(episode = temperature < lag(temperature, default = first(temperature))) %>%
            group_by(Patient, rleid = with(rle(episode), rep(seq_along(lengths), lengths))) %>%
            mutate(episode = ((n() >= 2) * episode + 1) * episode) %>%
            ungroup() %>%
            select(-rleid), by = c("Patient" = "Patient",
                                   "Minute" = "Minute",
                                   "temperature" = "temperature")) %>%
 mutate(episode = pmax(episode.x, episode.y)) %>%
 select(-episode.x, -episode.y)
于 2019-05-22T08:48:20.507 回答