我想知道是否有人有一个使用正则表达式解析 R 中文本的好例子。在下面的示例中,我想通过字符串解析并获取帐号、车辆名称和维护类型。
string[0]: 3423423
string[1]: Nissan
string[2]: Sparkplugs
string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
有点笨拙,但它有效:
string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
cuts <- c("Account: ", "vehicle ", "Maint: ")
sapply(cuts, function(x){sapply(strsplit(unlist(strsplit(string, x))[2]," "),"[",1)})
Account: vehicle Maint:
"3423423" "Nissan" "Sparkplugs"
这将为您提供所有匹配项,而不仅仅是一个,并且它将允许任何模式。
您定义起点item
:
string = "This is for Account: 3423423 his vehicle Nissan is going in
for Maint: Sparkplugs"
getter <- function(item, string) {
g <- gregexpr(paste0(item, "[^ ]+"), string)
start <- g[[1]] + nchar(item)
end <- g[[1]] + attr(g[[1]], "match.length") - 1
res <- mapply(substr, string, start, end)
names(res) <- NULL
res
}
account <-getter("Account: ", string)
vehicle <-getter("vehicle ", string)
maint <-getter("Maint: ", string)
或者让它更自动化:
items <- c("Account: ", "vehicle ", "Maint: ")
sapply(items, function(x) getter(x, string))