python - Python 正则表达式阻塞 \n

Question

我希望在 Python 中使用一个正则表达式来读取文本，查找 <emotion> 标记与 <location> 标记存在于同一句子中的所有实例，然后允许将这些句子打印到输出文件的唯一行：

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

问题是，如果我读入包含换行符的文件，正则表达式会失败：

import re
out = open('out.txt', 'w')

readfile = "<location> Oklahoma </location> where the wind \n comes <emotion> sweeping </emotion> down <location> the plain </location>. And the waving wheat. It can sure smell <emotion> sweet </emotion>." 

for match in re.findall(r'(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\bemotion>(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\blocation>(?=\s|\.|$)).*?\.(?=\s|$))', readfile, flags=re.I):
    line = ''.join(str(x) for x in match)
    out.write(line + '\n')

out.close()

有什么办法可以修改这个正则表达式，使它在碰到 \n 时不会窒息？对于其他人可以就这个问题提出的任何建议，我将不胜感激。

score 1 · Accepted Answer

将 rere.S或re.DOTALL（它们是同一件事）添加到您的正则表达式中的标志。这也将导致.匹配换行符。所以flags参数的新值是re.I | re.S.

score 0 · Accepted Answer

0

使用re.DOTALL/re.S

flags = re.DOTALL | re.I

于 2013-06-20T16:14:09.560 回答

python - Python 正则表达式阻塞 \n

2 回答 2

Related

Reference