r - 通过 dplyr::filter() + dplyr::across() 组合使用 any()、all() 等的正确方法是什么？

Question

说我有以下内容data.frame df：

#         col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello        10          3
# 2 WHAT_hello  WHAT_hello WHAT_hello         1          2
# 3      Hello       Hello      Hello         9          1

我想处理以仅保留那些在、或中至少data.frame包含前缀的行。WHAT_col1col2col3

现在我知道我可以使用轻松做到这一点|，但我试图通过使用dplyr::acrossandtidyselect::matches以及指向正确base::any的列来实现这一点。但这似乎不起作用，即使与.stringr::str_detectdplyr::filterdplyr::rowwise

那么在这里解决这个问题的正确方法是什么？我究竟做错了什么？

我想使用across+any主要是因为我可能不一定事先知道我在实际数据集中有多少这些列。

下面是我的示例（数据+代码）：

#Libraries.
library(base)
library(dplyr)
library(tidyselect)
library(stringr)
library(magrittr)



#Toy data.
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"), 
                 col2 = c("WHAT_hello2", "WHAT_hello", "Hello"), 
                 col3 = c("Hello", "WHAT_hello", "Hello"),
                 othercol1 = sample(1:10, 3), 
                 othercol11 = sample(1:10, 3), 
                 stringsAsFactors = FALSE)



#Works.
df %>% 
  filter(str_detect(col1, "^WHAT_") | str_detect(col2, "^WHAT_") | str_detect(col3, "^WHAT_"))

#Output.
# col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello         1          2
# 2 WHAT_hello  WHAT_hello WHAT_hello         5          4


#Works (incorrectly).
df %>% 
  filter(
    across(.cols = matches("^col"), 
           .fns = ~ any(str_detect(.x, "^WHAT")) )
  )

#Output.
# col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello         1          2
# 2 WHAT_hello  WHAT_hello WHAT_hello         5          4
# 3      Hello       Hello      Hello         4          7



#Works (incorrectly) also.
df %>% 
  rowwise() %>%
  filter(
    across(.cols = matches("^col"), 
           .fns = ~ any(str_detect(.x, "^WHAT")) )
  )

#Output.
#   col1       col2       col3       othercol1 othercol11
#   <chr>      <chr>      <chr>          <int>      <int>
# 1 WHAT_hello WHAT_hello WHAT_hello         5          4

score 3 · Accepted Answer

对于应用于行而不是列的函数，您可以c_across使用rowwise：

df %>% 
  rowwise() %>% 
  filter(any(str_detect(c_across(matches('^col')), '^WHAT')))

# # A tibble: 2 x 5
# # Rowwise: 
#   col1       col2        col3       othercol1 othercol11
#   <chr>      <chr>       <chr>          <int>      <int>
# 1 Hello      WHAT_hello2 Hello              9          7
# 2 WHAT_hello WHAT_hello  WHAT_hello         3         10

或者，使用acrosswith rowSums：

row_lgl <- 
  df %>% 
    transmute(across(.cols = matches("^col"), .fns = ~ str_detect(.x, "^WHAT"))) %>% 
    rowSums %>% 
    '>'(0)
           
df %>% 
  filter(row_lgl)
#         col1        col2       col3 othercol1 othercol11
# 1      Hello WHAT_hello2      Hello         9          7
# 2 WHAT_hello  WHAT_hello WHAT_hello         3         10

score 1 · Accepted Answer

使用base

df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"), 
                 col2 = c("WHAT_hello2", "WHAT_hello", "Hello"), 
                 col3 = c("Hello", "WHAT_hello", "Hello"),
                 othercol1 = sample(1:10, 3), 
                 othercol11 = sample(1:10, 3), 
                 stringsAsFactors = FALSE)

df 
#>         col1        col2       col3 othercol1 othercol11
#> 1      Hello WHAT_hello2      Hello         1          9
#> 2 WHAT_hello  WHAT_hello WHAT_hello         3          2
#> 3      Hello       Hello      Hello         4          8

df[apply(df, 1, function(x) sum(grepl(pattern = "^WHAT_", x = x))) != 0, ]
#>         col1        col2       col3 othercol1 othercol11
#> 1      Hello WHAT_hello2      Hello         1          9
#> 2 WHAT_hello  WHAT_hello WHAT_hello         3          2

^{由reprex 包（v0.3.0）于 2021 年 1 月 20 日创建}

使用tidyverse

library(tidyverse)
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"), 
                 col2 = c("WHAT_hello2", "WHAT_hello", "Hello"), 
                 col3 = c("Hello", "WHAT_hello", "Hello"),
                 othercol1 = sample(1:10, 3), 
                 othercol11 = sample(1:10, 3), 
                 stringsAsFactors = FALSE)


df %>% 
  filter(rowSums(across(.cols = where(is.character), .fns = ~ str_detect(.x, "^WHAT"))) != 0)
#>         col1        col2       col3 othercol1 othercol11
#> 1      Hello WHAT_hello2      Hello         1          3
#> 2 WHAT_hello  WHAT_hello WHAT_hello         7          4

^{由reprex 包（v0.3.0）于 2021 年 1 月 20 日创建}

r - 通过 dplyr::filter() + dplyr::across() 组合使用 any()、all() 等的正确方法是什么？

2 回答 2

Related

Reference