0

我想生成一个完全面板(每月)的时间序列

我试过tsibble这对大数据很有效,但对于有大量缺失数据的小数据集,它似乎选择了非常宽的间隔。

另外,为了方便比较许多不同的集合,我想指定开始和结束日期。

library(dplyr)
data <- structure(list(
  month = structure(c(18078, 18201), class = "Date"), 
  account = c("3125", "3100"), 
  sum = c(-21.0084, -2000)), 
  class = c("tbl_df", "tbl", "data.frame"), 
  row.names = c(NA, -2L))

data %>% 
  mutate(month = tsibble::yearmonth(month)) %>%
  tsibble::as_tsibble(key = account, index = month) %>%
  tsibble::fill_gaps(sum = 0, .full = T)

这里我有一个最小的例子,它导致

# A tibble: 4 x 3
     month account     sum
     <mth> <chr>     <dbl>
1 2019 Jul 3100        0  
2 2019 Nov 3100    -2000  
3 2019 Jul 3125      -21.0
4 2019 Nov 3125        0  

但应该从 5 月到 12 月开始,每个组(帐户)每个缺失的月份为 0。

4

1 回答 1

3
library(dplyr, warn.conflicts = FALSE)
library(tsibble, warn.conflicts = FALSE)

data <- structure(list(
      month = structure(c(18078, 18201), class = "Date"),
      account = c("3125", "3100"),
      sum = c(-21.0084, -2000)),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -2L))

data %>%
  mutate(month = yearmonth(month)) %>%
  as_tsibble(key = account, index = month) %>%
  full_join(
    tibble(
      month = seq(as.Date("2019-05-01"), as.Date("2019-12-01"), by = "1 month")
    )
  ) %>%
  fill_gaps(sum = 0, .full = TRUE) %>%
  filter(account != is.na(account)) %>%
  print(n = 20)
#> Joining, by = "month"
#> # A tsibble: 16 x 3 [1M]
#> # Key:       account [2]
#>       month account     sum
#>       <mth> <chr>     <dbl>
#>  1 2019 May 3100        0  
#>  2 2019 Jun 3100        0  
#>  3 2019 Jul 3100        0  
#>  4 2019 Aug 3100        0  
#>  5 2019 Sep 3100        0  
#>  6 2019 Oct 3100        0  
#>  7 2019 Nov 3100    -2000  
#>  8 2019 Dec 3100        0  
#>  9 2019 May 3125        0  
#> 10 2019 Jun 3125        0  
#> 11 2019 Jul 3125      -21.0
#> 12 2019 Aug 3125        0  
#> 13 2019 Sep 3125        0  
#> 14 2019 Oct 3125        0  
#> 15 2019 Nov 3125        0  
#> 16 2019 Dec 3125        0

reprex 包(v0.3.0)于 2020 年 1 月 15 日创建

于 2020-01-14T23:38:15.920 回答