6

我尝试使用stringr包来提取字符串的一部分,它位于两个特定模式之间。

例如,我有:

my.string <- "nanaqwertybaba"
left.border  <- "nana"
right.border <- "baba"

并通过使用str_extract(string, pattern)函数(其中模式由 POSIX 正则表达式定义)我想收到:

"qwerty"

谷歌的解决方案不起作用。

4

4 回答 4

14

baseR 中,您可以使用gsub. pattern创建编号的捕获组中的括号。这里我们选择 中的第二组replacement,即边框之间的组。.匹配任何字符。*表示有零个或多个前面的元素

gsub(pattern = "(.*nana)(.*)(baba.*)",
     replacement = "\\2",
     x = "xxxnanaRisnicebabayyy")
# "Risnice"
于 2014-04-07T22:46:17.743 回答
9

我不知道stringr提供的函数是否以及如何实现这一点,但您也可以使用 baseregexprsubstring

pattern <- paste0("(?<=", left.border, ")[a-z]+(?=", right.border, ")")
# "(?<=nana)[a-z]+(?=baba)"

rx <- regexpr(pattern, text=my.string, perl=TRUE)
# [1] 5
# attr(,"match.length")
# [1] 6

substring(my.string, rx, rx+attr(rx, "match.length")-1)
# [1] "qwerty"
于 2014-04-07T22:43:12.180 回答
6

I would use str_match from stringr: "str_match extracts capture groups formed by () from the first match. It returns a character matrix with one column for the complete match and one column for each group." ref

str_match(my.string, paste(left.border, '(.+)', right.border, sep=''))[,2]

The code above creates a regular expression with paste concatenating the capture group (.+) that captures 1 or more characters, with left and right borders (no spaces between strings).

A single match is assumed. So, [,2] selects the second column from the matrix returned by str_match.

于 2015-02-11T09:52:42.510 回答
0

您可以使用软件包unglue

library(unglue)
my.string <- "nanaqwertybaba"
unglue_vec(my.string, "nana{res}baba")
#> [1] "qwerty"
于 2019-10-08T21:06:11.997 回答