r - 过滤r范围内的最小值

Question

我有一个看起来像这样的数据框。

df <- data.frame (ptid  = c(1,1,1,1, 1, 2,2,2,3,3,3, 3),
              labid = c("CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE", "CRE","CRE", "CRE", "CRE"),
              age = c(50, 54, 50.7,  51.3, 51, 52, 35, 37, 46, 46.1, 46.1, 46.1))

在同一个参与者（同一个 ptid）中，我只想保留年龄在 2.0 岁以内的行。

这就是我希望我的结果看起来的样子：

result <- data.frame(ptid = c(1,1,2,2,3),
                     labid = c("CRE", "CRE", "CRE", "CRE", "CRE"),
                     age = c(50,54,52,35,46))

预先感谢您的帮助！我真的一直在努力解决这个问题！

score 1 · Accepted Answer

我们可以做一个arrange并diff使用filter

library(dplyr)
df %>%
   arrange(ptid, age) %>% 
   group_by(ptid) %>% 
   filter(c(first(age), diff(age)) > 2) %>%
   ungroup

-输出

# A tibble: 5 x 3
#   ptid labid   age
#  <dbl> <chr> <dbl>
#1     1 CRE      50
#2     1 CRE      54
#3     2 CRE      35
#4     2 CRE      52
#5     3 CRE      46

score 1 · Accepted Answer

你可以这样做：

df %>%
  group_by(ptid)%>%
  arrange(ptid, age) %>%
  mutate(grp = cumsum(cumsum(c(0, diff(age)))>2))%>%
  group_by(ptid, grp)%>%
  slice(1) %>%
  ungroup()%>%
  select(-grp)
# A tibble: 5 x 3
   ptid labid   age
  <dbl> <chr> <dbl>
1     1 CRE      50
2     1 CRE      54
3     2 CRE      35
4     2 CRE      52
5     3 CRE      46

score 0 · Accepted Answer

定义一个函数，该函数f要么cut将输入转换为区间并识别重复项，要么在失败时返回最小元素。然后，使用with a negationf在每个ptid级别上应用，通过将最大值和最小值的差除以2来获得区间数。ave!cutage

f <- function(x, ...) tryCatch(duplicated(cut(x, ...)), error=function(e) order(x) > 1)

res <- subset(df, !ave(age, ptid, FUN=function(x) f(x, diff(range(x)) / 2)))
res
#   ptid labid age
# 1    1   CRE  50
# 2    1   CRE  54
# 6    2   CRE  52
# 7    2   CRE  35
# 9    3   CRE  46

注： 1.观察顺序不会混淆。2.该解决方案会删除重复项，即 tie，即如果有更多的 ptid-labid 具有相同的年龄。（如果出于任何原因不希望这样做，请查看rank()而不是order().）

r - 过滤r范围内的最小值

3 回答 3

Related

Reference