1

我正在尝试编写一个将 wrap 的函数,dplyr::coalesce()并将接收数据对象和列名以合并。到目前为止,我的尝试都失败了。

示例数据

library(dplyr)

df <-
  data.frame(col_a = c("bob", NA, "bob", NA, "bob"), 
                 col_b = c(NA, "danny", NA, NA, NA), 
                 col_c = c("paul", NA, NA, "paul", NA))

##   col_a col_b col_c
## 1   bob  <NA>  paul
## 2  <NA> danny  <NA>
## 3   bob  <NA>  <NA>
## 4  <NA>  <NA>  paul
## 5   bob  <NA>  <NA>

在编写自定义函数时使用存根

coalesce_plus_1 <- function(data, vars) {

  data %>%
    mutate(coalesced_col = coalesce(!!! rlang::syms(tidyselect::vars_select(names(.), vars))))

}
coalesce_plus_2 <- function(data, vars) {
  
  data %>%
    mutate(coalesced_col = coalesce(!!! rlang::syms(vars)))
  
}
coalesce_plus_3 <- function(data, vars) {
  
  data %>%
    mutate(coalesced_col = coalesce({{ vars }}))
  
}

结果...

coalesce_plus_1()

df %>%
  coalesce_plus_1(data = ., vars = c(col_a, col_b, col_c))

错误:找不到对象“col_a”。

然而:

df %>%
  coalesce_plus_1(data = ., vars = all_of(starts_with("col")))

##   col_a col_b col_c coalesced_col
## 1  <NA>  <NA>  paul          paul
## 2  <NA> danny  <NA>         danny
## 3   bob  <NA>  <NA>           bob
## 4  <NA>  <NA>  paul          paul
## 5   bob  <NA>  <NA>           bob


coalesce_plus_2()

df %>%
  coalesce_plus_2(data = ., vars = c(col_a, col_b, col_c))

lapply(.x, .f, ...) 中的错误:找不到对象“col_a”

并且

df %>%
  coalesce_plus_2(data = ., vars = all_of(starts_with("col")))

错误:starts_with()必须在选择函数中使用。我参见https://tidyselect.r-lib.org/reference/faq-selection-context.html。运行rlang::last_error()以查看错误发生的位置。



coalesce_plus_3()

df %>%
  coalesce_plus_3(data = ., vars = c(col_a, col_b, col_c))

错误:mutate()输入有问题coalesced_col。x 输入 coalesced_col不能被回收到大小 5。 i 输入coalesced_colcoalesce(c(col_a, col_b, col_c)). i 输入coalesced_col的大小必须为 5 或 1,而不是 15。

并且

df %>%
  coalesce_plus_3(data = ., vars = all_of(starts_with("col")))

错误:mutate()输入有问题coalesced_col。xstarts_with()必须在选择函数中使用。我参见https://tidyselect.r-lib.org/reference/faq-selection-context.html。i 输入coalesced_colcoalesce(all_of(starts_with("col")))

底线

我如何编写一个自定义函数coalesce(),它将接收一个数据对象和特定的列名来合并,允许特定的命名(例如)c(col_a, col_b, col_c)和辅助函数(例如,starts_with("col")在函数的vars参数中)?

4

2 回答 2

5

这是一个简单的实现,它只会返回选择列,但可以很容易地扩展以保留所有列(我bind_cols最后将它们重新打开......)。

这很简单,因为我们依赖于select为我们完成工作,正如实施 tidyselect 小插图开头所建议的那样

# edited to keep all columns
coalesce_df = function(data, ...) {
  data %>%
    select(...) %>%
    transmute(result = invoke(coalesce, .)) %>%
    bind_cols(data, .)
}



df %>%
   coalesce_df(everything())
#   col_a col_b col_c result
# 1   bob  <NA>  paul    bob
# 2  <NA> danny  <NA>  danny
# 3   bob  <NA>  <NA>    bob
# 4  <NA>  <NA>  paul   paul
# 5   bob  <NA>  <NA>    bob

df %>% coalesce_df(col_a, col_b)
#   col_a col_b col_c result
# 1   bob  <NA>  paul    bob
# 2  <NA> danny  <NA>  danny
# 3   bob  <NA>  <NA>    bob
# 4  <NA>  <NA>  paul   <NA>
# 5   bob  <NA>  <NA>    bob
于 2020-11-23T17:04:01.853 回答
1

实际上,您的第一个功能有效,只需编写vars为字符即可。看:

df %>% coalesce_plus_1(data = ., vars = c("col_a","col_b","col_c"))

这是另一个不错的选择:

library(dplyr)

df <- data.frame(col_a = c("bob", NA, "bob", NA, "bob"), 
                 col_b = c(NA, "danny", NA, NA, NA), 
                 col_c = c("paul", NA, NA, "paul", NA))

coalesce_plus <- function(data,vars){
      x <- as.list(select(data,vars))
      data.frame(data, coalesced_col=coalesce(!!!x))
}

df %>% coalesce_plus(data = ., vars = c("col_a","col_b","col_c"))
于 2020-11-23T17:15:07.313 回答