6

我正在尝试使用该dplyr::filter()函数过滤我的 tibble 的特定行。

这是我的小标题的一部分head(raw.tb)

A tibble: 738 x 4
      geno   ind     X     Y
     <chr> <chr> <int> <int>
 1 san1w16    A1   467   383
 2 san1w16    A1   465   378
 3 san1w16    A1   464   378
 4 san1w16    A1   464   377
 5 san1w16    A1   464   376
 6 san1w16    A1   464   375
 7 san1w16    A1   463   375
 8 san1w16    A1   463   374
 9 san1w16    A1   463   373
10 san1w16    A1   463   372
# ... with 728 more rows

当我要求:raw.tb %>% dplyr::filter(ind == contains("A"))

我得到: Error in filter_impl(.data, quo) : Evaluation error: No tidyselect variables were registered

在我的小标题unique(raw.tb$ind)是:

    [1] "A1"  "A10" "A11" "A12" "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "B1" 
[14] "B10" "B11" "B12" "B2"  "B3"  "B4"  "B5"  "B6"  "B7"  "B8"  "B9"  "C1"  "C10"
[27] "C11" "C12" "C2"  "C3"  "C4"  "C5"  "C6"  "C7"  "C8"  "C9"  "D1"  "D10" "D11"
[40] "D12" "D2"  "D3"  "D4"  "D5"  "D6"  "D7"  "D8"  "D9"  "E1"  "E10" "E11" "E12"
[53] "E2"  "E3"  "E4"  "E5"  "E6"  "E7"  "E8"  "E9"  "F1"  "F10" "F11" "F12" "F2" 
[66] "F3"  "F4"  "F5"  "F6"  "F7"  "F8"  "F9"  "G1"  "G10" "G11" "G2"  "G3"  "G4" 
[79] "G5"  "G6"  "G7"  "G8"  "G9"  "H1"  "H10" "H11"

而且我想raw.tb$ind使用 tidyverse 语言仅提取以“A”开头的行。

(我知道如何在基础 R 中做到这一点,但我的目标是使用 tidyverse)。

非常感谢任何反馈

4

2 回答 2

9

需要一个filter逻辑向量来过滤行。selecthelper ( ?select_helpers) 函数根据contains某种模式选择数据集的列。为了过滤行,我们可以使用greplfrombase R

raw.tb %>%
   dplyr::filter(grepl("A", ind)) 

str_detect来自stringr(其中一个包tidyverse

raw.tb %>%
  dplyr::filter(stringr::str_detect(ind, "A"))
于 2018-02-04T12:24:38.183 回答
1

只需写出akrun 的评论,@akrun 随时接管此答案以防万一。

创建一些数据,

dput(raw.tb) 
raw.tb <- structure(list(geno = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "san1w16", class = "factor"), ind = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 1L), .Label = c("A1", "B1", "C1", 
"D1", "E1"), class = "factor"), X = c(467L, 465L, 464L, 464L, 
464L, 464L, 463L, 463L, 463L, 463L), Y = c(383L, 378L, 378L, 
377L, 376L, 375L, 375L, 374L, 373L, 372L)), .Names = c("geno", 
"ind", "X", "Y"), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10"), class = c("tbl_df", "tbl", "data.frame"
))

数据,

raw.tb
#> # A tibble: 10 x 4
#>       geno    ind     X     Y
#>  *  <fctr> <fctr> <int> <int>
#>  1 san1w16     A1   467   383
#>  2 san1w16     A1   465   378
#>  3 san1w16     B1   464   378
#>  4 san1w16     B1   464   377
#>  5 san1w16     C1   464   376
#>  6 san1w16     C1   464   375
#>  7 san1w16     D1   463   375
#>  8 san1w16     D1   463   374
#>  9 san1w16     E1   463   373
#> 10 san1w16     A1   463   372

方法#1

raw.tb %>% dplyr::filter(str_detect(ind, "A"))
#> # A tibble: 3 x 4
#>      geno    ind     X     Y
#>    <fctr> <fctr> <int> <int>
#> 1 san1w16     A1   467   383
#> 2 san1w16     A1   465   378
#> 3 san1w16     A1   463   372

方法#1

raw.tb %>% dplyr::filter(grepl("A", ind))
#> # A tibble: 3 x 4
#>      geno    ind     X     Y
#>    <fctr> <fctr> <int> <int>
#> 1 san1w16     A1   467   383
#> 2 san1w16     A1   465   378
#> 3 san1w16     A1   463   372
于 2018-02-04T12:26:29.577 回答