我正在尝试使用正则表达式从文本文件中提取“条目”。文件的每一行都是一个单独的条目,除非该行以空格开头,在这种情况下,该行是前一行的延续。
例子:
import re
INPUT = """\
This is entry 1.
This
is
entry 2.
And this is entry 3.
This
is
entry
4."""
OUTPUT = ["This is entry 1.",
"This\n is\n entry 2.",
"And this is entry 3.",
"This\n is\n entry\n 4."]
# What should the pattern be?
PATTERN = re.compile("(.+)(?=\n|$)")
assert PATTERN.findall(INPUT) == OUTPUT
什么应该PATTERN
匹配所有条目?