1

免责声明:我不确定“崩溃”是否适合此操作。如果有更合适的术语,我会全力以赴。

我有数百名患者随着时间的推移多次观察的症状严重程度数据。严重性是按顺序定义的。这是一个简化的示例:

# Create example dataset
example.dat <- data.frame(
  ID = c(1,1,1,2,2,2,3,3,3,4,4,4),  # patient ID numbers
  Time = c("T1", "T2", "T3", "T1", "T2", "T3",  # times at which data were collected
           "T1", "T2", "T3", "T1", "T2", "T3"),
  Severity = c("Mild", "Moderate", "Mild",  # severity of symptoms
          "Severe", "Severe", "Moderate",
          "None", NA, "None",
          "Moderate", "Moderate", "Mild")
)

# Specify the order of the factor levels
example.dat$Severity <- ordered(example.dat$Severity,
                                levels = c("None",
                                           "Mild",
                                           "Moderate",
                                           "Severe")
                                )

example.dat

生成的数据框如下所示:

   ID Time Severity
1   1   T1     Mild
2   1   T2 Moderate
3   1   T3     Mild
4   2   T1   Severe
5   2   T2   Severe
6   2   T3 Moderate
7   3   T1     None
8   3   T2     <NA>
9   3   T3     None
10  4   T1 Moderate
11  4   T2 Moderate
12  4   T3     Mild

我想创建一个新列,其中包含为每个 ID 观察到的最严重症状(即有序因子的最高级别),如下所示:

   ID Time Severity    Worst
1   1   T1     Mild Moderate
2   1   T2 Moderate Moderate
3   1   T3     Mild Moderate
4   2   T1   Severe   Severe
5   2   T2   Severe   Severe
6   2   T3 Moderate   Severe
7   3   T1     None     None
8   3   T2     <NA>     None
9   3   T3     None     None
10  4   T1 Moderate Moderate
11  4   T2 Moderate Moderate
12  4   T3     Mild Moderate

从那里,我可以轻松地创建这个数据框的子集,其中包括,对于每个 ID,最近观察的时间和研究期间报告的最严重症状:

   ID Time    Worst
3   1   T3 Moderate
6   2   T3   Severe
9   3   T3     None
12  4   T3 Moderate

有什么想法吗?

4

3 回答 3

2

您可以使用 id 通过 id 找到最大/最严重的症状ave

example.dat$Worst <- ave(example.dat$Severity, example.dat$ID, 
                                      FUN = function(i) max(i, na.rm=TRUE)) 

na.rm由于某些 ID 的缺失值而使用该选项

然后,您可以设置子集以仅保留最近的时间。

于 2014-07-28T22:18:17.730 回答
1

这是使用aggregateR中的函数的解决方案:

example.dat <- data.frame(
ID = c(1,1,1,2,2,2,3,3,3,4,4,4),  # patient ID numbers
Time = c("T1", "T2", "T3", "T1", "T2", "T3",  # times at which data were collected
       "T1", "T2", "T3", "T1", "T2", "T3"),
Severity = c("Mild", "Moderate", "Mild",  # severity of symptoms
      "Severe", "Severe", "Moderate",
      "None", NA, "None",
      "Moderate", "Moderate", "Mild")
)

# Specify the order of the factor levels
example.dat$Severity <- ordered(example.dat$Severity,
                            levels = c("None",
                                       "Mild",
                                       "Moderate",
                                       "Severe")
                            )


new <- aggregate(Severity ~ ID , data = example.dat, FUN = max)
names(new)[names(new) == "Severity"] <- "Worst"
(final <- merge(example.dat, new))
于 2014-07-28T23:34:13.783 回答
1

使用dplyr

library(dplyr)
 res <- example.dat %>%
 group_by(ID) %>% 
 mutate(Worst=Severity[which.max(Severity)])

res
#Source: local data frame [12 x 4]
# Groups: ID

#    ID Time Severity    Worst
# 1   1   T1     Mild Moderate
# 2   1   T2 Moderate Moderate
# 3   1   T3     Mild Moderate
# 4   2   T1   Severe   Severe
# 5   2   T2   Severe   Severe
# 6   2   T3 Moderate   Severe
# 7   3   T1     None     None
# 8   3   T2       NA     None
# 9   3   T3     None     None
# 10  4   T1 Moderate Moderate
# 11  4   T2 Moderate Moderate
# 12  4   T3     Mild Moderate

 filter(res, Time=="T3") %>% select(-Severity)
#Source: local data frame [4 x 4]
#Groups: ID
#   ID Time    Worst
# 1  1   T3 Moderate
# 2  2   T3   Severe
# 3  3   T3     None
# 4  4   T3 Moderate

或者data.table

library(data.table) ## 1.9.3
setDT(example.dat)[,Worst := Severity[which.max(Severity)], by=ID]    
example.dat

您可以从此处获取最新版本 1.9.3 。相反,如果您想使用 CRAN 版本 1.9.2,那么有一个小错误,我们必须处理这些因素,该错误已在 1.9.3 中修复:

library(data.table) ## 1.9.2 from CRAN
setDT(example.dat)[, Worst := as.character(Severity)]
example.dat[, Worst := Worst[which.max(Severity)], by=ID]

假设数据集已经按 排序ID,Time,这将直接为您提供最终解决方案:

require(data.table) ## 1.9.3
setDT(example.dat)[, list(Time=Time[.N], Worst=Severity[which.max(Severity)]), by=ID]
#    ID Time    Worst
# 1:  1   T3 Moderate
# 2:  2   T3   Severe
# 3:  3   T3     None
# 4:  4   T3 Moderate

setDT将 data.frame 转换为 data.table。然后,我们分组ID并获得该组中的最后一个值,Time使用.N它是一个长度为 1 的整数向量,其中包含该组中的观察数。同样,我们对相应的最大值进行子集化Severity

于 2014-07-29T07:55:13.850 回答