我正在利用下面的代码与 1 个匹配项进行部分匹配,但有一个后续问题:假设我们对鱼有一个额外的标准,我们希望将“狗鱼”归类为鱼和犬类。这可能吗?
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
"black panther", "short cat", "red bird",
"short bird stuffed", "big eagle", "bad sparrow",
"dog fish", "head dog", "brown yorkie",
"lab short bulldog"), label=1:14)
在代码开头定义正则表达式
regexes <- list(c("(cat|lion|tiger|panther)","feline"),
c("(bird|eagle|sparrow)","avian"),
c("(dog|yorkie|bulldog)","canine"))
创建一个向量,长度与df相同
output_vector <- character(nrow(d))
对于每个正则表达式..
for(i in seq_along(regexes)){
#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]}
将现在填充的输出向量插入数据框中
d$species <- output_vector
期望的输出
# name label species
#1 brown cat 1 feline
#2 blue cat 2 feline
#3 big lion 3 feline
#4 tall tiger 4 feline
#5 black panther 5 feline
#6 short cat 6 feline
#7 red bird 7 avian
#8 short bird stuffed 8 avian
#9 big eagle 9 avian
#10 bad sparrow 10 avian
#11 dog fish 11 canine, fish
#12 head dog 12 canine
#13 brown yorkie 13 canine
#14 lab short bulldog 14 canine
原始堆栈溢出问题在这里:部分字符串匹配r