嗨朋友们我正在尝试在文件列表中搜索特定的关键字(以 txt 给出)。我正在使用正则表达式来检测和替换文件中关键字的出现。下面是一个逗号分隔的关键字,我传递给它进行搜索。
library(stringi)
txt <- "automatically got activated,may be we download,network services,food quality is excellent"
例如“自动被激活”应该被搜索并替换为automatic_got_activated...“可能是我们下载”替换为“may_be_we_download”等等。
txt <- "automatically got activated,may be we download,network services,food quality is excellent"
for(i in 1:length(txt)) {
start <- head(strsplit(txt, split=" ")[[i]], 1) #finding the first word of the keyword
n <- stri_stats_latex(txt[i])[4] #number of words in the keyword
o <- tolower(regmatches(text, regexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",
n-1,"}"),text,ignore.case=TRUE))) #best match for keyword for the regex in the file
p <- which(!is.na(pmatch(txt, o))) #exact match for the keywords
}