我有一个数据框,需要创建一个标志来指示两列之间存在部分匹配的实例,这里是代码和一些虚拟数据:
doc_id <- c("doc1","doc1","doc2","doc3","doc3","doc4","doc4")
word <- c("apple","apples","chicken","banana","bananas","veggie","veggies")
text <- c("yesterday I ate apples", "yesterday I ate apples", "yesterday I ate chicken", "yesterday I ate bananas", "yesterday I ate bananas", "yesterday I ate veggies", "yesterday I ate veggies")
mydata <- data.frame(doc_id,word,text,stringsAsFactors = FALSE)
预期的结果是相同的数据框,其中包含一个额外的列,显示单词和文本之间的匹配是否为部分匹配
doc_id <- c("doc1","doc1","doc2","doc3","doc3","doc4","doc4")
word <- c("apple","apples","chicken","banana","bananas","veggie","soup")
text <- c("yesterday I ate apples", "yesterday I ate apples", "yesterday I ate chicken", "yesterday I ate bananas", "yesterday I ate bananas", "yesterday I ate veggies", "yesterday I ate soup")
partial_match <- c("1","0","0","1","0","1","0")
mydata2 <- data.frame(doc_id,word,text,partial_match,stringsAsFactors = FALSE)
我试过了
str_detect(mydata$word, mydata$text)
以及使用诸如charmatch,pmatch,grep和grepl之类的功能但没有成功的类似事情。
真实数据包含数千条记录,因此解决方案应可扩展。
谢谢。