我有一个项目列表和一个搜索词列表,我正在尝试做两件事:
- 在项目中搜索与任何搜索词的匹配项,如果找到匹配项,则返回 true。
- 对于所有返回 true 的项目(即存在匹配项),我还想返回在步骤 1 中匹配的原始搜索词。
因此,给定以下数据框:
items
1 alex
2 alex is a person
3 this is a test
4 false
5 this is cathy
以及以下搜索词列表:
"alex" "bob" "cathy" "derrick" "erica" "ferdinand"
我想创建以下输出:
items matches original
1 alex TRUE alex
2 alex is a person TRUE alex
3 this is a test FALSE <NA>
4 false FALSE <NA>
5 this is cathy TRUE cathy
第 1 步相当简单,但我在第 (2) 步遇到问题。要创建“匹配”列,我使用grepl()
创建一个变量,TRUE
如果其中的行在d$items
搜索词列表中,FALSE
否则。
对于第 2 步,我的想法是我应该能够grep()
在指定时使用value = T
,如下面的代码所示。但是,这会返回错误的值:它不是返回由 grep 匹配的原始搜索词,而是返回匹配项的值。所以我得到以下输出:
items matches original
1 alex TRUE alex
2 alex is a person TRUE alex is a person
3 this is a test FALSE <NA>
4 false FALSE <NA>
5 this is cathy TRUE this is cathy
这是我现在正在使用的代码。任何想法将不胜感激!
# Dummy data and search terms
d = data.frame(items = c("alex", "alex is a person", "this is a test", "false", "this is cathy"))
searchTerms = c("alex", "bob", "cathy", "derrick", "erica", "ferdinand")
# Return true iff search term is found in items column, not between letters
d$matches = grepl(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])",
searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "",
collapse = "|"), d[,1], ignore.case = TRUE
)
# Subset data
dMatched = d[d$matches==T,]
# This is where the problem is: return the value that was originally matched with grepl above
dMatched$original = grep(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])",
searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "",
collapse = "|"), dMatched[,1], ignore.case = TRUE, value = TRUE
)
d$original[d$matches==T] = dMatched$original