python - python .replace() 正则表达式

Question

我正在尝试抓取'</html>'标签后的所有内容并将其删除，但我的代码似乎没有做任何事情。不.replace()支持正则表达式？

z.write(article.replace('</html>.+', '</html>'))

score 623 · Accepted Answer

不会。Python 中的正则表达式由re模块处理。

article = re.sub(r'(?is)</html>.+', '</html>', article)

一般来说：

text_after = re.sub(regex_search_term, regex_replacement, text_before)

score 83 · Accepted Answer

为了使用正则表达式替换文本，请使用re.sub函数：

子（模式，repl，字符串[，计数，标志]）

它将用pattern作为传递的文本替换非重复实例string。例如，如果您需要分析匹配以提取有关特定组捕获的信息，则可以将函数传递给string参数。更多信息在这里。

例子

>>> import re
>>> re.sub(r'a', 'b', 'banana')
'bbnbnb'

>>> re.sub(r'/\d+', '/{id}', '/andre/23/abobora/43435')
'/andre/{id}/abobora/{id}'

score 7 · Accepted Answer

您可以将该re模块用于正则表达式，但对于您想要的，正则表达式可能是矫枉过正的。我可能会尝试类似

z.write(article[:article.index("</html>") + 7]

这要干净得多，并且应该比基于正则表达式的解决方案快得多。

score 4 · Accepted Answer

对于这种特殊情况，如果使用re模块是多余的，那么使用split（或rsplit）方法怎么样

se='</html>'
z.write(article.split(se)[0]+se)

例如，

#!/usr/bin/python

article='''<html>Larala
Ponta Monta 
</html>Kurimon
Waff Moff
'''
z=open('out.txt','w')

se='</html>'
z.write(article.split(se)[0]+se)

输出out.txt为

<html>Larala
Ponta Monta 
</html>

python - python .replace() 正则表达式

4 回答 4

Related

Reference