1

我想使用单个正则表达式从字符串中提取几条数据。我制作了一个模式,其中包括这些片段作为括号中的子表达式。在类似 perl 的环境中,我只需通过代码等将这些子表达式传递给变量myvar1=$1; myvar2=$2;- 但是如何在 R 中做到这一点?目前,我发现访问这些事件的唯一方法是通过 regexec。这不是很方便,因为 regexec 不支持 perl 语法和其他原因。这就是我现在必须做的:

getoccurence <- function(text,rex,n) { # rex is the result of regexec function
  occstart <- rex[[1]][n+1]
  occstop  <- occstart+attr(rex[[1]],'match.length')[n+1]-1
  occtext  <- substr(text,occstart[i],occstop)
  return(occtext)
}
mytext <- "junk text, 12.3456, -01.234, valuable text before comma, all the rest"
mypattern <- "([0-9]+\\.[0-9]+), (-?[0-9]+\\.[0-9]+), (.*),"
rez <- regexec(mypattern, mytext)
var1 <- getoccurence(mytext, rez, 1)  
var2 <- getoccurence(mytext, rez, 2)  
var3 <- getoccurence(mytext, rez, 3)  

显然,这是一个相当笨拙的解决方案,应该有更好的东西。我会很感激任何建议。

4

3 回答 3

2

你看过regmatches吗?

> regmatches(mytext, rez)
[[1]]
[1] "12.3456, -01.234, valuable text before comma," "12.3456"                                      
[3] "-01.234"                     "valuable text before comma"                   

> sapply(regmatches(mytext, rez), function(x) x[4])
[1] "valuable text before comma"
于 2013-01-24T04:53:10.873 回答
1

stringr中,这是str_matchstr_match_all(如果要匹配字符串中模式的每次出现。 str_match返回一个矩阵,str_match_all返回一个矩阵列表

library(stringr)
str_match(mytext, mypattern)
str_match_all(mytext, mypattern)
于 2013-01-24T13:27:13.630 回答
1

strapply并且strapplycgsubfn 包中可以一步完成:

> strapplyc(mytext, mypattern)
[[1]]
[1] "12.3456"                    "-01.234"                   
[3] "valuable text before comma"

> # with simplify = c argument
> strapplyc(mytext, mypattern, simplify = c)
[1] "12.3456"                    "-01.234"                   
[3] "valuable text before comma"

> # extract second element only 
> strapply(mytext, mypattern, ... ~ ..2)
[[1]]
[1] "-01.234"

> # specify function slightly differently and use simplify = c
> strapply(mytext, mypattern, ... ~ list(...)[2], simplify = c)
[1] "-01.234"

> # same
> strapply(mytext, mypattern, x + y + z ~ y, simplify = c)
[1] "-01.234"

> # same but also convert to numeric - also can use with other variations above
> strapply(mytext, mypattern, ... ~ as.numeric(..2), simplify = c)
[1] -1.234

在上面的示例中,第三个参数可以是一个函数,或者在示例中,一个公式可以转换为一个函数(LHS 代表参数,RHS 是主体)。

于 2013-01-24T14:34:46.137 回答