3

我正在利用下面的代码与 1 个匹配项进行部分匹配,但有一个后续问题:假设我们对鱼有一个额外的标准,我们希望将“狗鱼”归类为鱼和犬类。这可能吗?

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                 "black panther", "short cat", "red bird",
                 "short bird stuffed", "big eagle", "bad sparrow",
                 "dog fish", "head dog", "brown yorkie",
                 "lab short bulldog"), label=1:14)

在代码开头定义正则表达式

regexes <- list(c("(cat|lion|tiger|panther)","feline"),
            c("(bird|eagle|sparrow)","avian"),
            c("(dog|yorkie|bulldog)","canine"))

创建一个向量,长度与df相同

output_vector <- character(nrow(d))

对于每个正则表达式..

for(i in seq_along(regexes)){

#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]} 

将现在填充的输出向量插入数据框中

d$species <- output_vector

期望的输出

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine, fish
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

原始堆栈溢出问题在这里:部分字符串匹配r

4

1 回答 1

3

我会通过交叉连接来完成。

library(dplyr)
library(stringi)

key = data_frame(partial = c("cat", "lion", "tiger", "panther",
                             "bird", "eagle", "sparrow",
                             "dog", "yorkie", "bulldog"),
                  category = c("feline", "feline", "feline", "feline",
                               "avian", "avian", "avian",
                               "canine", "canine", "canine"))

d %>%
  merge(key) %>%
  filter(name %>% stri_detect_fixed(partial) )
于 2015-10-10T06:04:22.487 回答