我正在尝试解析电子邮件回复的文本并删除引用的文本(以及它后面的任何内容,包括签名)
此代码正在返回:消息测试 2013 年 6 月 25 日星期二晚上 10:01,Catie Brand <
我希望它只返回消息测试
我错过了什么正则表达式?
def format_mail_plain(value, from_address):
res = [re.compile(r'From:\s*' + re.escape(from_address), re.IGNORECASE),
re.compile('<' + re.escape(from_address) + '>', re.IGNORECASE),
re.compile(r'\s+wrote:', re.IGNORECASE | re.MULTILINE),
re.compile(r'On.*?wrote:.*?', re.IGNORECASE | re.MULTILINE | re.DOTALL),
re.compile(r'-+original\s+message-+\s*$', re.IGNORECASE),
re.compile(r'from:\s*$', re.IGNORECASE),
re.compile(r'^>.*$', re.IGNORECASE | re.MULTILINE)]
whitespace_re = re.compile(r'\s+')
lines = list(line.rstrip() for line in value.split('\n'))
result = ''
for line_number, line in zip(range(len(lines)), lines):
for reg_ex in res:
if reg_ex.search(line):
return result
if not whitespace_re.match(line):
if '' is result:
result += line
else:
result += '\n' + line
return result
************************ Sample Text *****************************
message tests
On Tue, Jun 25, 2013 at 10:01 PM, XXXXX XXXX <
conversations+yB1oupeCJzMOBj@xxxx.com> wrote:
> **
> [image: Krow] <http://www.krow.com/>
************************ Result **********************************
message tests
On Tue, Jun 25, 2013 at 10:01 PM, XXXXX XXXX <
我宁愿结果是:
************************ Result **********************************
message tests