python - Regex split with multiple lines

Question

The following function which splits a string by the occurrence of a pattern doesn't work when the text inside brackets spans multiple lines:

import re
def header(text):
    authors = [i.strip() for i in re.split(r'\\and|\\thanks\{.*?\}', text, flags=re.M)]
    names = filter(None,authors)
    return '{} and {}'.format(', '.join(names[:-1]), names[-1])

print header(r"""John Bar \and Tom Foo\thanks{Testing if this works with 
multiple lines} \and Sam Baz""")

I don't know if the regex is wrong or if I'm using incorrectly the flag in the splitfunction.

score 2 · Accepted Answer

2

re.M用于多行字符串中的锚点。你想要的是re.S，这使得.匹配换行符。

于 2013-04-30T20:14:34.010 回答

score 2 · Accepted Answer

您应该使用re.DOTALL标志：

re.S
re.DOTALL

制作“。” 特殊字符完全匹配任何字符，包括换行符；没有这个标志，'.' 将匹配除换行符以外的任何内容。

python - Regex split with multiple lines

2 回答 2

Related

Reference