python - 正则表达式查找和替换多个

Question

我正在尝试编写一个匹配所有情况的正则表达式

[[any text or char her]]

在一系列的文字中。

例如：

My name is [[Sean]]
There is a [[new and cool]] thing here.

使用我的正则表达式，这一切都很好。

data = "this is my tes string [[ that does some matching ]] then returns."
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

问题是当我有多个匹配实例发生时：[[hello]] 和 [[bye]]

例如：

data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

这将匹配 hello 的左括号和 bye 的右括号。我希望它取代他们两个。

score 3 · Accepted Answer

.*是贪婪的并且匹配尽可能多的文本，包括]]and [[，所以它会通过你的“标签”边界。

一个快速的解决方案是通过添加一个来使明星变得懒惰?：

p = re.compile(r"\[\[(.*?)\]\]")

一个更好（更健壮和明确但稍慢）的解决方案是明确我们不能跨标签边界匹配：

p = re.compile(r"\[\[((?:(?!\]\]).)*)\]\]")

解释：

\[\[        # Match [[
(           # Match and capture...
 (?:        # ...the following regex:
  (?!\]\])  # (only if we're not at the start of the sequence ]]
  .         # any character
 )*         # Repeat any number of times
)           # End of capturing group
\]\]        # Match ]]

score 2 · Accepted Answer

在 a 之后使用不贪婪匹配.*?<~~或使其匹配尽可能少的字符。默认是贪婪的，并且消耗尽可能多的字符。?+*

p = re.compile("\[\[(.*?)\]\]")

score 1 · Accepted Answer

你可以使用这个：

p = re.compile(r"\[\[[^\]]+\]\]")

>>> data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
>>> p = re.compile(r"\[\[[^\]]+\]\]")
>>> data = p.sub('STAR', data)
>>> data
'this is my new string it contains STAR and STAR and nothing else'

python - 正则表达式查找和替换多个

3 回答 3

Related

Reference