1

我想知道是否有人有一个使用正则表达式解析 R 中文本的好例子。在下面的示例中,我想通过字符串解析并获取帐号、车辆名称和维护类型。

string[0]: 3423423 

string[1]: Nissan

string[2]: Sparkplugs

 string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs" 
4

2 回答 2

2

有点笨拙,但它有效:

string = "This is for Account: 3423423 his vehicle Nissan is going in for Maint: Sparkplugs"
cuts <- c("Account: ", "vehicle ", "Maint: ")

sapply(cuts, function(x){sapply(strsplit(unlist(strsplit(string, x))[2]," "),"[",1)})

   Account:      vehicle       Maint:  
   "3423423"     "Nissan" "Sparkplugs"
于 2013-08-13T15:29:13.010 回答
2

这将为您提供所有匹配项,而不仅仅是一个,并且它将允许任何模式。

您定义起点item

string = "This is for Account: 3423423 his vehicle Nissan is going in 
          for Maint: Sparkplugs" 

getter <- function(item, string) {
  g <- gregexpr(paste0(item, "[^ ]+"), string)
  start <- g[[1]] + nchar(item)
  end <- g[[1]] + attr(g[[1]], "match.length") - 1
  res <- mapply(substr, string, start, end)
  names(res) <- NULL
  res
}

account <-getter("Account: ", string)
vehicle <-getter("vehicle ", string)
maint <-getter("Maint: ", string)

或者让它更自动化:

items <- c("Account: ", "vehicle ", "Maint: ")
sapply(items, function(x) getter(x, string))
于 2013-08-13T15:51:56.037 回答