1

我正在尝试complete为几个分类变量创建一个数据框,因此,使用该函数为数据中存在的每个分类变量组合创建一个连贯的时间序列对象。nesting

这是一个示例数据框 -

> dput(df)
structure(list(ds = structure(c(1546300800, 1546387200, 1546473600, 
1546560000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    y = c(40, 40, 40, 40), type = c("a", "a", "a", "b"), city = c("x", 
    "x", "x", "y"), hid = c(1, 2, 2, 3)), row.names = c(NA, -4L
), na.action = structure(c(`5` = 5L), class = "omit"), class = c("tbl_df", 
"tbl", "data.frame"))

# Find the date range
min_date <- min(df$ds)
max_date <- max(df$ds)
dates_seq <- seq.POSIXt(from = min_date, 
                        to = max_date, 
                        by = '1 day')

这是我尝试过的,它给出了预期的结果 -

df %>%
    complete(nesting(type, city, hid), 
             ds = dates_seq, 
             fill = list(y = 0))

# A tibble: 12 x 5
#   type  city    hid ds                      y
#   <chr> <chr> <dbl> <dttm>              <dbl>
# 1 a     x         1 2019-01-01 00:00:00    40
# 2 a     x         1 2019-01-02 00:00:00     0
# 3 a     x         1 2019-01-03 00:00:00     0
# 4 a     x         1 2019-01-04 00:00:00     0
# 5 a     x         2 2019-01-01 00:00:00     0
# 6 a     x         2 2019-01-02 00:00:00    40
# 7 a     x         2 2019-01-03 00:00:00    40
# 8 a     x         2 2019-01-04 00:00:00     0
# 9 b     y         3 2019-01-01 00:00:00     0
#10 b     y         3 2019-01-02 00:00:00     0
#11 b     y         3 2019-01-03 00:00:00     0
#12 b     y         3 2019-01-04 00:00:00    40

如果我不明确知道哪些是分类变量,我该df如何将这些列传递给nesting?我的假设是所有实例df都至少包含两ds, y列。


编辑:我也尝试了以下,这会引发错误-

complete(df, 
    nesting(names(df)[!(names(df) %in% c("ds", "y"))]), 
    ds = dates_seq, 
    fill = list(y = 0))
4

2 回答 2

2

我们可以使用这个rlang包。使用syms(因为有多个列)fornames(df)[!names(df) %in% c("ds", "y")]并存储在一个变量中,然后在函数!!!内部使用nesting

library(tidyverse)
library(rlang)

ne <- syms(names(df)[!names(df) %in% c("ds", "y")])

df %>%
  complete(nesting(!!!ne), 
           ds = dates_seq, 
           fill = list(y = 0))
# # A tibble: 12 x 5
#    type  city    hid ds                      y
#    <chr> <chr> <dbl> <dttm>              <dbl>
#  1 a     x         1 2019-01-01 00:00:00    40
#  2 a     x         1 2019-01-02 00:00:00     0
#  3 a     x         1 2019-01-03 00:00:00     0
#  4 a     x         1 2019-01-04 00:00:00     0
#  5 a     x         2 2019-01-01 00:00:00     0
#  6 a     x         2 2019-01-02 00:00:00    40
#  7 a     x         2 2019-01-03 00:00:00    40
#  8 a     x         2 2019-01-04 00:00:00     0
#  9 b     y         3 2019-01-01 00:00:00     0
# 10 b     y         3 2019-01-02 00:00:00     0
# 11 b     y         3 2019-01-03 00:00:00     0
# 12 b     y         3 2019-01-04 00:00:00    40
于 2019-01-29T10:14:39.813 回答
1

这是使用运算符解决此问题的另一种方法!!!-

df %>%
    complete(nesting(!!!select(df, -ds, -y)), 
             ds = dates_seq, 
             fill = list(y = 0))
于 2019-01-29T10:49:00.063 回答