python - python用正则表达式替换字符串

Question

我正在寻找一个正则表达式来识别模板中的块，这样我就可以提供文本来替换整个块

<div>
 {% for link in links %}
     textext
 {% endfor %}
</div>

得到这样的东西

<div>
 mytext
</div>

score 1 · Accepted Answer

尝试：

re.sub('\{.*[\w\s]*.*\}','mytext',txt)

输出：

'<div>\n mytext\n</div>'

\{匹配第一个大括号，然后.*[\w\s]*.*匹配所有其余部分（包括空格和换行符），直到最后一个大括号\}。

您可以使用以下内容更具体：

re.sub('\{% for link in links.*[\w\s]*.*end for %\}','mytext',txt)

然后您可以确定它只会匹配您指定类型的 for 循环。

编辑：eyquem 指出我的回答对于许多情况来说是不够的，特别是如果它在中间有符号。冒着天真地误解我的解决方案为什么不起作用的风险，我只是在我的模式中添加了一个额外的位，它甚至可以成功匹配他的测试用例，所以我们将看看它是否有效：

re.sub('\{.*[\W\w\s]*.*\}', 'mytext', txt)

结果（txteyquems 的 Pink Floyd 示例在哪里）：

"Pink Floyd"
<div>
 mytext
</div>
"Fleetwood Mac"

所以，我认为添加所有非字母数字符号可以解决它。或者我可能在另一个案例中更明显地打破了它。我相信有人会指出来。:)'

EDIT2：还应该注意的是，如果页面上有多个for-loop，我们的两种解决方案都会失败。例子：

"Beatles"
<div>
 {% for link in links %}
    iiiY=uuu
    12345678
 {% endfor %}
</div>
"Tino Rossi"
{ for link in links % }
   asdfasdfas
{% endfor% }

产量

"Beatles"
<div>
 mytext

并通过匹配 AFTER 之后的下一组来删除其余部分。

编辑 2： eyquem 再次正确地修复了他的问题，如果后面有的话，不要删掉。他的修复也修复了我的：

re.sub('\{.*[\W\w\s]*?.*\}', 'mytext', txt)

是新模式。

score 1 · Accepted Answer

我很遗憾地说，Logan 的回答在以下情况下不起作用：

import re

ss1 = '''"Pink Floyd"
<div>
 {% for link in links %}
    aaaY}eee
    12345678
 {% endfor %}
</div>
"Fleetwood Mac"'''

pat = '(\{.*)([\w\s]*)(.*)(\})'
print ss1
print '---------------------------'
for el in re.findall(pat,ss1):
    print el
print '---------------------------'
print re.sub(pat,':::::',ss1)

结果

"Pink Floyd"
<div>
 {% for link in links %}
    aaaY}eee  # <--------- } here
    12345678
 {% endfor %}
</div>
"Fleetwood Mac"
---------------------------
('{% for link in links %}', '\n    aaaY', '', '}')
('{% endfor %', '', '', '}')
---------------------------
"Pink Floyd"
<div>
 :::::eee
    12345678
 :::::
</div>
"Fleetwood Mac"

.
.

import re

ss2 = '''"Beatles"
<div>
 {% for link in links %}
    iiiY=uuu  # <-------- = here
    12345678
 {% endfor %}
</div>
"Tino Rossi"'''

pat = '(\{.*)([\w\s]*)(.*)(\})'
print ss2
print '---------------------------'
for el in re.findall(pat,ss2):
    print el
print '---------------------------'
print re.sub(pat,':::::',ss2)

结果

"Beatles"
<div>
 {% for link in links %}
    iiiY=uuu
    12345678
 {% endfor %}
</div>
"Tino Rossi"
---------------------------
('{% for link in links %', '', '', '}')
('{% endfor %', '', '', '}')
---------------------------
"Beatles"
<div>
 :::::
    iiiY=uuu
    12345678
 :::::
</div>
"Tino Rossi"

问题如下（我的代码中的findall()结果有助于理解）：

.*只要不遇到换行符，第一个就会运行。
然后[\w\s]*只要有这些类别的字符就运行： 字母、数字、下划线、空格。
空格中有换行符，然后[\w\s]*可以从一行传递到下一行。
但是，如果遇到不在这些类别中的字符[\w\s]*，它将在该字符处停止。

如果是 a ，则在此之前}的最后一个.*匹配项。然后正则表达式搜索下一个匹配项。''}

如果是 a =，则在到达下一个之前，最后一个.*无法匹配文本套件，}因为它无法传递下一个换行符。因此，结果与}文本中的 a 不同。

.

替换.*为.+不会改变任何内容，正如在上面的代码中.*替换为所看到的那样。.+

.

我的解决方案

我在这段代码中提出了模式：

import re
pat = ('\{%[^\r\n]+%\}'
       '.+?'
       '\{%[^\r\n]+%\}')


ss = '''"Pink Floyd"
<div>
 {% for link in links %}
    aaaY}eee
    12345678
 {% endfor %}
</div>
"Fleetwood Mac"
"Beth Hart"
"Jimmy Cliff"
"Led Zepelin"
Beatles"
<div>
 {% for link in links %}
    iiiY=uuu
    12345678
 {% endfor %}
</div>
"Tino Rossi"'''


print '\n',ss,'\n\n---------------------------\n'
print re.sub(pat,':::::',ss,flags=re.DOTALL)

导致

"Pink Floyd"
<div>
 {% for link in links %}
    aaaY}eee
    12345678
 {% endfor %}
</div>
"Fleetwood Mac"
"Beth Hart"
"Jimmy Cliff"
"Led Zepelin"
Beatles"
<div>
 {% for link in links %}
    iiiY=uuu
    12345678
 {% endfor %}
</div>
"Tino Rossi" 

---------------------------

"Pink Floyd"
<div>
 :::::
</div>
"Fleetwood Mac"
"Beth Hart"
"Jimmy Cliff"
"Led Zepelin"
Beatles"
<div>
 :::::
</div>
"Tino Rossi"

编辑

更简单：

pat = ('\{%[^}]+%\}'
       '.+?'
       '\{%[^}]+%\}')

仅当法线{%.....%}不包含符号时}

score 0 · Accepted Answer

大锤方法是：

In [540]: txt = """<div>
 {% for link in links %}
     textext
 {% endfor %}
</div>"""

In [541]: txt
Out[541]: '<div>\n {% for link in links %}\n     textext\n {% endfor %}\n</div>'

In [542]: re.sub("(?s)<div>.*?</div>", "<div>mytext</div>", txt)
Out[542]: '<div>mytext</div>'

python - python用正则表达式替换字符串

3 回答 3

Related

Reference