r - dplyr：如何在函数中使用 group_by？

Question

我想在另一个函数中使用该dplyr::group_by函数，但我不知道如何将参数传递给这个函数。

有人可以提供一个工作示例吗？

library(dplyr)
data(iris)
iris %.% group_by(Species) %.% summarise(n = n()) # 
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n())
mytable0(iris, "Species") # OK
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n())
mytable1(iris, "Species") # Wrong!
# Error: unsupported type for column 'as.name(key)' (SYMSXP)

mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n())
mytable2(iris, "Species") # Wrong!
# Error: index out of bounds

score 73 · Accepted Answer

对于编程，group_by_对应于group_by：

library(dplyr)

mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")

这使：

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

更新在写这篇文章的时候 dplyr used%.%这是上面最初使用的，但现在%>%更受欢迎，所以上面已经更改为保持相关性。

更新 2重组现已弃用，请改用 group_by_。

根据 Roberto 的评论，更新 3 group_by_(list(...))现在成为group_by_(...)dplyr 的新版本。

更新 4添加了评论中建议的微小变化。

更新 5：使用 rlang/tidyeval 现在可以这样做：

library(rlang)
mytable <- function(x, ...) {
  group_ <- syms(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, "Species")

或通过Species未评估，即没有引号：

library(rlang)
mytable <- function(x, ...) {
  group_ <- enquos(...)
  x %>% 
    group_by(!!!group_) %>% 
    summarise(n = n())
}
mytable(iris, Species)

更新 6： 如果只有一个分组变量，现在有一个 {{...}} 表示法有效：

mytable <- function(x, group) {
  x %>% 
    group_by({{group}}) %>% 
    summarise(n = n())
}
mytable(iris, Species)

score 11 · Accepted Answer

更新：从 dplyr 0.7.0 开始，您可以使用 tidy eval 来完成此操作。

有关详细信息，请参阅http://dplyr.tidyverse.org/articles/programming.html 。

library(tidyverse)
data("iris")

my_table <- function(df, group_var) {
  group_var <- enquo(group_var)      # Create quosure
  df %>% 
    group_by(!!group_var) %>%        # Use !! to unquote the quosure
    summarise(n = n())
}

my_table(iris, Species)

> my_table(iris, Species)
# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 versicolor    50
3  virginica    50

score 4 · Accepted Answer

作为@G回答中更新6的补充。格洛腾迪克，如果您想在摘要函数中使用字符串作为参数，而不是使用双括号 ( ) 来包含参数，您应该使用编程小插图中所述的代词：循环多个变量：{{.data

mytable <- function( x, group ) {
  x %>% 
    group_by( .data[[group]] ) %>% 
    summarise( n = n() )
}

group_string <- 'Species'

mytable( iris, group_string )

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  Species        n
  <fct>      <int>
1 setosa        50
2 versicolor    50
3 virginica     50

score 2 · Accepted Answer

他们来的时候很丑，但她工作：

mytable3 <- function(x, key) {
  my.call <- bquote(summarise(group_by(.(substitute(x)), NULL), n = n()))
  my.call[[2]][[3]] <- as.name(key)
  eval(my.call, parent.frame())
} 
mytable3(iris, "Species")
# Source: local data frame [3 x 2]
#
#      Species  n
# 1  virginica 50
# 2 versicolor 50
# 3     setosa 50

几乎可以肯定有一些情况会导致它崩溃，但你明白了。我不认为你可以绕过这个电话。另一件确实有效但更丑陋的事情是：

mytable4 <- function(x, key) summarise(group_by(x, x[[key]]), n = n())

r - dplyr：如何在函数中使用 group_by？

4 回答 4

Related

Reference