regex - 删除两个括号之间的所有文本

Question

假设我有一些这样的文字，

text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")

我想删除（编辑：摆脱）[和]（以及括号本身）之间的所有文本。最好的方法是什么？这是我使用正则表达式和 stingr 包的微弱尝试：

str_extract(text, "\\[[a-z]*\\]")

谢谢你的帮助！

score 28 · Accepted Answer

有了这个：

gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);

正则表达式的含义：

  \[                       # '['
  [^\]]*                   # any character except: '\]' (0 or more
                           # times (matching the most amount possible))
  \]                       # ']'

score 10 · Accepted Answer

以下应该可以解决问题。?强制进行惰性匹配，在.随后的].

gsub('\\[.*?\\]', '', text)

score 3 · Accepted Answer

3

这是另一种方法：

library(qdap)
bracketX(text, "square")

于 2014-05-31T07:42:25.010 回答

score 3 · Accepted Answer

无需使用带有否定字符类/括号表达式的 PCRE 正则表达式，“经典”TRE 正则表达式也可以：

subject <- "Some [string] here and [there]"
gsub("\\[[^][]*]", "", subject)
## => [1] "Some  here and "

查看在线 R 演示

详情：

\\[- 文字[（必须转义或在括号表达式中使用，如[[]被解析为文字[）
[^][]*- 一个否定括号表达式，匹配除[and之外的 0+ 个字符]（请注意，]括号表达式开头的被视为文字]）
]- 文字]（此字符在 PCRE 和 TRE 正则表达式中都不是特殊字符，不必转义）。

如果您只想用其他分隔符替换方括号，请在替换模式中使用带有反向引用的捕获组：

gsub("\\[([^][]*)\\]", "{\\1}", subject)
## => [1] "Some {string} here and {there}"

查看另一个演示

括号构造(...)形成一个捕获组，并且可以通过反向引用访问其内容\1（因为该组是模式中的第一个，其 ID 设置为 1）。

score 3 · Accepted Answer

我认为这在技术上回答了你的问题，但你可能想\\:在正则表达式的末尾添加一个更漂亮的文本（删除冒号和空格）。

library(stringr)
str_replace_all(text, "\\[.+?\\]", "")

#> [1] ": We need tax policies that respect the wage earners..."

对...

str_replace_all(text, "\\[.+?\\]\\: ", "")
#> [1] "We need tax policies that respect the wage earners..."

由reprex 包（v0.2.0）于 2018 年 8 月 16 日创建。

regex - 删除两个括号之间的所有文本

5 回答 5

Related

Reference