r - 如何创建具有完整连续月份的组子集

Question

我正在尝试创建一个在 R 中有完整连续月份的组子集。

例如，如果有如下数据：

structure(list(Group = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c(3, 
4, 7, 8, 1, 2, 3, 4)), class = "data.frame", row.names = c(NA, 
-8L), codepage = 65001L)

在表格中，这看起来像：

╔═══════╦═══════╗
║ Group ║ Month ║
╠═══════╬═══════╣
║ 1     ║ 3     ║
╠═══════╬═══════╣
║ 1     ║ 4     ║
╠═══════╬═══════╣
║ 1     ║ 7     ║
╠═══════╬═══════╣
║ 1     ║ 8     ║
╠═══════╬═══════╣
║ 2     ║ 1     ║
╠═══════╬═══════╣
║ 2     ║ 2     ║
╠═══════╬═══════╣
║ 2     ║ 3     ║
╠═══════╬═══════╣
║ 2     ║ 4     ║
╚═══════╩═══════╝

我希望删除第 1 组，因为连续几个月有一个“关口”（没有第 5 个月、第 6 个月）。

score 1 · Accepted Answer

可以使用基础 R 解决方案ave，即

df[!!with(df, ave(Month, Group, FUN = function(i)all(diff(i) == 1))),]

#  Group Month
#5     2     1
#6     2     2
#7     2     3
#8     2     4

score 0 · Accepted Answer

一种dplyr选择可能是：

df %>%
 group_by(Group) %>%
 filter(all(diff(Month) == 1))

  Group Month
  <dbl> <dbl>
1     2     1
2     2     2
3     2     3
4     2     4

score 0 · Accepted Answer

这是使用subset+的基本 R 选项ave

> subset(df,as.logical(ave(Month,Group, FUN = function(x) all(diff(x)==1))))
  Group Month
5     2     1
6     2     2
7     2     3
8     2     4

score 0 · Accepted Answer

与所有组的观察数量进行对比并检查所有差异是否等于 1 也是有效的：

library(tidyverse)
#Code
df %>% group_by(Group) %>%
  mutate(Diff=c(1,diff(Month)),
         Value=n()==sum(Diff==1)) %>%
  filter(Value) %>% ungroup() %>% select(-c(Value,Diff))

输出：

# A tibble: 4 x 2
  Group Month
  <dbl> <dbl>
1     2     1
2     2     2
3     2     3
4     2     4

使用的一些数据：

#Data
df <- structure(list(Group = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c(3, 
4, 7, 8, 1, 2, 3, 4)), class = "data.frame", row.names = c(NA, 
-8L), codepage = 65001L)

r - 如何创建具有完整连续月份的组子集

4 回答 4

Related

Reference