regex - R 捕获从模式到模式的所有内容

Question

我正在尝试在两个模式之间提取一个子字符串，BB并且</p>：

require("stringr")
str = "<notes>\n  <p>AA:</p>\n   <p>BB: word, otherword</p>\n    <p>Number:</p>\n    <p>Level: 1</p>\n"
str_extract(str, "BB.*?:</p>")

提取的子字符串应该是“word, otherword”，但我捕获的太多了：

  [1] "BB: word, otherword</p>\n    <p>Number:</p>"

score 2 · Accepted Answer

2

也许是这样的？

> gsub(".*BB: (.*?)</p>.*$", "\\1", str)
# [1] "word, otherword"

于 2013-03-12T12:43:14.613 回答

score 2 · Accepted Answer

这是 Perl 正则表达式的工作。即，lookahead 和lookbehind 引用。stringr您可以将正则表达式包装在一个函数perl中，如下所示：

str_extract(str, perl("(?<=BB: ).*?(?=</p>)"))
[1] "word, otherword"

您也可以使用 base 执行此操作：

regmatches(str, regexpr(perl("(?<=BB: ).*?(?=</p>)"), str, perl=TRUE))
[1] "word, otherword"

regex - R 捕获从模式到模式的所有内容

2 回答 2

Related

Reference