python - python：如何用正则表达式分割这个字符串？

Question

这里很简单，但我对 Python 还很陌生。

我有一个这样的字符串：

this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page
// newlines added for readability

我需要使用这个正则表达式拆分字符串：- 想法是有时评论有一个“标题”（我在我的模板中使用），有时他们没有。

我试过这个：

re.split("<!--pagebreak*.?-->", str)

它仅返回分页符中带有“标题”的项目（并且也没有正确拆分它们）。我在这里做错了什么？

score 2 · Accepted Answer

更改*.?为.*?：

re.split("<!--pagebreak.*?-->", str)

您当前的正则表达式接受任意数量的文字k，可选地后跟（任何字符）。

另外，我建议r"..."您对正则表达式使用原始字符串 ( )。在这种情况下没有必要，但这是一种让自己免于头疼的简单方法。

score 2 · Accepted Answer

2

您将与交换.了*。正确的正则表达式是：

<!--pagebreak.*?-->

于 2012-10-04T08:41:35.020 回答

score 2 · Accepted Answer

绝对是换机的问题。和 *。“。” 匹配所有，星号表示您将获取尽可能多的字符（当然受非贪婪限定符“？”的限制）

import re

s = """this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page"""

print re.split(r'<!--pagebreak.*?-->', s)

输出：

['这是一篇文章的第一页\n', '这是第二页\n', '这是第三页\n', '最后一页']

python - python：如何用正则表达式分割这个字符串？

3 回答 3

Related

Reference