26

是否可以在引用值列表时使用grepl参数,也许使用 %in% 运算符?我想获取下面的数据,如果动物名称中有“狗”或“猫”,我想返回一个特定的值,比如“keep”;如果它没有“狗”或“猫”,我想返回“丢弃”。

data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T))

现在,如果我只是通过严格匹配值来做到这一点,比如“猫”和“狗”,我可以使用以下方法:

matches <- c("cat","dog")

data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard")

但使用 grep 或 grepl 仅指列表中的第一个参数:

data$keep <- ifelse(grepl(matches, data$animal), "Keep","Discard")

返回

Warning message:
In grepl(matches, data$animal) :
  argument 'pattern' has length > 1 and only the first element will be used

注意,我在搜索中看到了这个线程,但这似乎不起作用: grep using a character vector with multiple patterns

4

3 回答 3

31

|您可以在 的正则表达式中使用“或”( ) 语句grepl

ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard"
# [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"   
#[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"   
#[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"   
#[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
#[49] "keep"    "discard"

正则表达式dog|cat告诉正则表达式引擎查找"dog""cat",并返回两者的匹配项。

于 2014-08-19T20:02:40.610 回答
25

不确定您尝试了什么,但这似乎有效:

data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")

类似于您链接到的答案。

诀窍是使用粘贴:

paste(matches, collapse = "|")
#[1] "cat|dog"

因此,它使用 dog 或 cat 创建了一个正则表达式,并且还可以处理一长串模式而无需键入每个模式。

编辑:

如果您稍后根据“保留”和“丢弃”条目对 data.frame 进行子集处理,则可以使用以下方法更直接地执行此操作:

data[grepl(paste(matches, collapse = "|"), data$animal),]

这样,结果为greplTRUE 或 FALSE 的结果将用于子集。

于 2014-08-19T20:07:51.050 回答
16

尽量避免ifelse。例如,这很好用

c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]

123你会得到一颗种子

##  [1] "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Discard" "Keep"   
##  [9] "Discard" "Discard" "Keep"    "Discard" "Keep"    "Discard" "Keep"    "Keep"   
## [17] "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"   
## [25] "Keep"    "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [33] "Keep"    "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
## [41] "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"    "Discard"
## [49] "Keep"    "Keep"   
于 2014-08-19T21:00:25.677 回答