r - 引用 R 函数中引用的列名

Question

我想在用户定义的函数中使用na_omit折叠包中的函数。na_omit要求将列名放在引号中作为其参数之一。如果我不需要引号中的列名，我可以只引用双括号中的列名{{col}}，如本小插图“使用 dplyr 编程”中所述。如果我使用胶水包引用列，例如glue::glue("{col}")，我收到错误。

这是一个代表：

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

可以使用以下命令生成预期的输出：

my_df %>% 
  collapse::na_omit(cols = c("color_code"))

并且应该产生：

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

我应该如何在 R 中的用户定义函数中引用作为参数和函数参数的引用列名？

score 2 · Accepted Answer

一般来说，collapse 主要是标准评估，它的 NSE 特性基于 base R，因此大多数 rlang、glue stuff{{ }}等都不起作用，但您将拥有更简单和更快的代码。有关基本 R NSE 函数式编程，请参阅http://adv-r.had.co.nz/Computing-on-the-language.html。

正如 r2evans 所建议的，对于单个列，解决方案是：

my_func <- function(df, col) { 
  col_char_ref <- as.character(substitute(col))
  df %>% 
    collapse::na_omit(cols = col_char_ref)
}

即用于substitute()捕获表达式和as.character/或all.vars提取变量。对于多列，一般解决方案是包装，fselect例如

library(collapse)
my_func <- function(df, ...) {
  cols <- fselect(df, ..., return = "indices")
  na_omit(df, cols = cols) 
}

my_func(wlddev, PCGDP:GINI, POP) |> head()
#>   country iso3c       date year decade                region
#> 1 Albania   ALB 1997-01-01 1996   1990 Europe & Central Asia
#> 2 Albania   ALB 2003-01-01 2002   2000 Europe & Central Asia
#> 3 Albania   ALB 2006-01-01 2005   2000 Europe & Central Asia
#> 4 Albania   ALB 2009-01-01 2008   2000 Europe & Central Asia
#> 5 Albania   ALB 2013-01-01 2012   2010 Europe & Central Asia
#> 6 Albania   ALB 2015-01-01 2014   2010 Europe & Central Asia
#>                income  OECD    PCGDP LIFEEX GINI       ODA     POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104

^{由reprex 包创建于 2022-02-03 (v2.0.1)}

score 0 · Accepted Answer

您必须提供 col 名称作为字符，例如：

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, col = "color_code")

score 0 · Accepted Answer

首先确定您在 R 中编程的环境很重要。您使用的是dplyr还是 base R？如果在dplyr中，请参考使用dplyr、rlang、glue和这个 stackoverflow 答案进行编程的文档。如果在基础 R 中，请参考有关 non-standard evaluation 的文档，尤其是将引用的列as.character(substitute())包装在eval(substitute()).

需要注意的是，上述两种方法都涉及非标准评估。另一种方法是使用标准评估（或标准评估和非标准评估的某种“组合”）。例如，请参阅此链接中提出的问题。

这个问题的原因至少部分来自环境混乱。以下是reprex 中的一些不同方法。

数据

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

套餐

library(collapse)
library(dplyr)
library(stringr)
library(glue)

基础 R 中的函数式编程（非标准评估）
，带有引用的列名：

my_func <- function(df, col) {
  col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
  df %>% 
    collapse::na_omit(cols = col_char_ref) 
}

my_func(my_df, color_code)

#Should generate output below
my_df %>% 
  collapse::na_omit(cols = "color_code")

并使用未引用的列名：

my_func <- my_func <- function(df, col){
  df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
  eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}

 my_func(my_df, color)

 #Should generate output below
 my_df %>%  
  collapse::ftransform(count = stringr::str_length(color))

使用胶水和dplyr函数在 dplyr（非标准评估）
中使用带引号的列名进行函数式编程 ：

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

或使用 C 语言包装函数使用带引号的列名：

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

并使用未引用的列名：

my_func <- function(df, col){
  df %>%  
    dplyr::mutate(count = stringr::str_length({{ col }}))
}

my_func(my_df, color)

#Should generate output below
my_df %>% 
  dplyr::mutate(count = stringr::str_length(color))

更正产生错误的代码
以下产生错误的代码为以下两个示例提供了动机：

my_func <- function(df, col){
  df <- df
  df %>%  
    collapse::na_omit(cols = as.character(substitute(col))) %>% 
    eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}

my_func(my_df, color_code)

#Error in ckmatch(cols, nam) : Unknown columns: col

下面的示例是不会产生错误的替代方案。

基本 R 中的函数式编程（标准评估 - 要求列作为字符串在函数中传递）

library(pkgcond)

my_func <- function(df, col) {
  if (!is.character(substitute(col)))
    pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
  df <- na_omit(df, cols = col) 
  df$count <- stringr::str_length(.subset2(df, col))
  df
}

my_func(my_df, "color_code")

#Should generate output below
my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

基础 R 中的函数式编程（标准评估和非标准评估的“组合”）

my_func <- function(df, col){
  df <- df
  df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
  eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}

 my_func(my_df, color_code)

 #Should generate output below
 my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

可以在此链接中引用使用collapse包的更复杂的示例。

r - 引用 R 函数中引用的列名

3 回答 3

Related

Reference