5

我正在尝试为summarise()任意组的任意变量编写一个简单的包装器,并且已经取得了进展,现在我已经加载了正确的库版本,但是(再次)对如何取消引用具有多个值的参数感到困惑。

我目前有以下功能...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = site,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- enquo(group)
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
           group_by(!!quo_group, variable) %>%
           summarise(n    = n(),
                     mean = mean(value, na.rm = TRUE))
    return(results)
}

如果我指定一个分组变量,它会得到大部分的方式并且可以工作......

> table_summary(df = mtcars, id = model, select = c(mpg), group = gear)
# A tibble: 3 x 4
# Groups:   c(gear) [?]
       gear variable     n     mean
      <dbl>    <chr> <int>    <dbl>
1         3      mpg    15 16.10667
2         4      mpg    12 24.53333
3         5      mpg     5 21.38000

group_by(!!quo_group, variable)...但是当我指定多个时失败group = c(gear, hp)...

> mtcars$model <- rownames(mtcars)
> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp))
Error in mutate_impl(.data, dots) : 
  Column `c(gear, hp)` must be length 32 (the group size) or one, not 64

我回去重新阅读了编程 dplyr 文档,我读到您可以使用而不是捕获多个变量,然后使用取消引用拼接它们,所以尝试了......quos()enquo()!!!

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    quo_group  <- quos(group)  ## Use quos() rather than enquo()
    UQS(quo_group) %>% print() ## Check to see what quo_group holds
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...现在第一次引用!!!quo_group``withindplyr::select() regardless of how many variables are specified undergroup = `...

> table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))
[[1]]
<quosure: frame>
~group

attr(,"class")
[1] "quosures"
Error in overscope_eval_next(overscope, expr) : object 'gear' not found
> traceback()
17: .Call(rlang_eval, f_rhs(quo), overscope)
16: overscope_eval_next(overscope, expr)
15: FUN(X[[i]], ...)
14: lapply(.x, .f, ...)
13: map(.x[matches], .f, ...)
12: map_if(ind_list, !is_helper, eval_tidy, data = names_list)
11: select_vars(names(.data), !(!(!quos(...))))
10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group)))
8: function_list[[i]](value)
7: freduce(value, `_function_list`)
6: `_fseq`(`_lhs`)
5: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>% 
       unique()
1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear))

看起来很奇怪,我认为问题的根源是!!!quo_group(ie UQS(quo_group)) 打印出来~gear而不是 quosures 列表,因为将 a 添加print()到工作示例中显示发生......

> my_summarise <- function(df, ...) {
    group_by <- quos(...)
    UQS(group_by) %>% print()
    df %>%
    group_by(!!!group_by) %>%
    summarise(a = mean(a))
  }
> df <- tibble(
    g1 = c(1, 1, 2, 2, 2),
    g2 = c(1, 2, 1, 2, 1),
    a = sample(5), 
    b = sample(5)
  )
> my_summarise(df, g1, g2)
[[1]]
<quosure: global>
~g1

[[2]]
<quosure: global>
~g2

attr(,"class")
[1] "quosures"
# A tibble: 4 x 3
# Groups:   g1 [?]
     g1    g2     a
  <dbl> <dbl> <dbl>
1     1     1   1.0
2     1     2   5.0
3     2     1   2.5
4     2     2   4.0

我想明确地提供我希望分组的变量作为我的参数的参数,但是如果我将它们指定为,它是否有效,...但我决定在提供分组变量时测试我的函数是否有效...

table_summary <- function(df     = .,
                          id     = individual_id,
                          select = c(),
                          group  = c(),
                          digits = 3,
                          ...){
    ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
    quo_id     <- enquo(id)
    quo_select <- enquo(select)
    ## quo_group  <- quos(group)
    quo_group  <- quos(...)
    UQS(quo_group) %>% print()
    ## Subset the data
    df <- df %>%
          dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>%
          unique()
    ## gather() data, just in case there is > 1 variable selected to be summarised
    df <- df %>%
          gather(key = variable, value = value, !!quo_select)
    ## Summarise selected variables by specified groups
    results <- df %>%
               group_by(!!!quo_group, variable) %>%
               summarise(n    = n(),
                         mean = mean(value, na.rm = TRUE))
    return(results)
}

...但它没有,quos()再次取消引用拼接,NULL因此变量既不会被选择也不会被...分组

> table_summary(df = mtcars, id = model, select = c(mpg), gear, hp)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062
> table_summary(df = mtcars, id = model, select = c(mpg), gear)
NULL
# A tibble: 1 x 3
  variable     n     mean
     <chr> <int>    <dbl>
1      mpg    32 20.09062

我已经经历了几次这个周期,现在检查了每种使用方法enquo()quos()但看不到我哪里出错了,尽管已经多次阅读了编程 dplyr 文档。

4

1 回答 1

4

IIUC 你的帖子,你想提供c(col1, col2)group_by(). 该动词不支持这一点:

group_by(mtcars, c(cyl, am))
#> Error in mutate_impl(.data, dots) :
#>   Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64

那是因为group_by()具有变异语义,而不是选择语义。这意味着您提供的表达式group_by()是转换表达式。这是一个令人惊讶但非常方便的功能。例如,您可以按disp如下方式分组:

group_by(mtcars, cut3 = cut(disp, 3))

这也意味着,如果您提供c(cyl, am),它将把两列连接在一起并返回一个长度为 64 的向量,而它预期的长度为 32(行数)。

所以你的问题是你想要一个group_by()具有选择语义的包装器。使用 很容易做到这一点dplyr::select_vars(),它将很快被提取到新的 tidyselect 包中:

library("dplyr")

group_wrapper <- function(df, groups = rlang::chr()) {
  groups <- select_vars(tbl_vars(df), !! enquo(groups))
  group_by(df, !!! rlang::syms(groups))
}

或者,您可以包装group_by_at()具有选择语义的新动词:

group_wrapper <- function(df, groups = rlang::chr()) {
  group_by_at(df, vars(!! enquo(groups)))
}

让我们试一试:

group_wrapper(mtcars, c(disp, am))
#> # A tibble: 32 x 11
#> # Groups:   disp, am [27]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6   160   110  3.90  2.62  16.5     0     1     4     4
#> # ... with 22 more rows

该接口的优点是支持所有select()操作来选择要分组的列。

请注意,我将其rlang::chr()用作默认参数,因为选择函数不支持c()返回NULL(我们将来可能希望更改它)。chr()不带参数调用返回长度为 0 的字符向量。

于 2017-06-09T19:23:28.260 回答