0

我有一个数据集,它有 5ID秒,跨度从01-01-201012-31-2013。我首先split是数据ID,最后是一个列表对象。然后,我创建另一个列表,该列表创建 10 天的间隔并按ID.

我想根据间隔元素中标记ID的 s 将这些间隔嵌套到 s的第一个列表中。ID

例如:主列表由IDs 作为元素组成。, [1],是嵌套在其中的区间。例如,[2]区间中的所有区间都是针对A,因为它是针对 B,因为它是针对 C,等等。[3]ID[A]ID[B][C]

[A]
   [1]
   [2]
   [3]
[B]
   [1]
   [2]
   [3]
[C]
   [1]
   [2]
   [3]
[D]
   [1]
   [2]
   [3]
[E]
   [1]
   [2]
   [3]

下面的代码将区间嵌套到ID列表中,但它嵌套了所有的IDs 而不是它应该在其中的特定的。

set.seed(12345)
library(lubridate)
library(tidyverse)

date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
ID <- rep(c("A","B","C","D", "E"), 100)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df_ID <- split(df, df$ID)


df_nested <- lapply(df_ID, function(x){
  x %>%
    arrange(ID) %>% 
    # Creates a new column assigning the first day in the 10-day interval in which
    # the date falls under (e.g., 01-01-2010 would be in the first 10-day interval
    # so the `floor_date` assigned to it would be 01-01-2010)
    mutate(new = floor_date(date, "10 days")) %>%
    # For any months that has 31 days, the 31st day would normally be assigned its 
    # own interval. The code below takes the 31st day and joins it with the 
    # previous interval. 
    mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
    group_by(new, .add = TRUE) %>%
    group_split()
})
4

1 回答 1

1

我会这样做:

set.seed(12345)
library(lubridate)
library(tidyverse)

f = function(data){
  data %>% mutate(
    new = floor_date(data$date, "10 days"),
    new = if_else(day(new) == 31, new - days(10), new)
  )
}

tibble(
  ID = rep(c("A","B","C","D", "E"), 100),
  date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
  x = runif(length(date), min = 60000, max = 80000),
  y = runif(length(date), min = 800000, max = 900000)
) %>% group_by(ID) %>% 
  nest() %>% 
  mutate(data = map(data, f)) %>% 
  unnest(data)

输出

# A tibble: 500 x 5
# Groups:   ID [5]
   ID    date            x       y new       
   <chr> <date>      <dbl>   <dbl> <date>    
 1 A     2010-01-01 74418. 820935. 2010-01-01
 2 A     2010-01-06 63327. 885896. 2010-01-01
 3 A     2010-01-11 60691. 873949. 2010-01-11
 4 A     2010-01-16 69250. 868411. 2010-01-11
 5 A     2010-01-21 69075. 876142. 2010-01-21
 6 A     2010-01-26 67797. 829892. 2010-01-21
 7 A     2010-01-31 75860. 843542. 2010-01-21
 8 A     2010-02-05 67233. 882318. 2010-02-01
 9 A     2010-02-10 75644. 826283. 2010-02-01
10 A     2010-02-15 66424. 853789. 2010-02-11

简单明了,不是吗?

您想要对数据执行的所有操作都包含在该f函数中。您可以根据需要扩展它。

其余的以简单的方案完成 tibble %>% group_by %>% nest % mutate %>% unnest

于 2021-09-03T21:13:28.253 回答