我需要解析(拆分)一个包含从 Outlook 导出的电子邮件的文本文件。我用preg_split
with分割它PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
我的目标是使用正则表达式捕获邮件标题部分,即从“发件人:”行开始,以邮件正文前的空行结束。
约束:
- 需要多语言字段名称
- 标头字段的数量不同(CC、BCC、附件)
- 某些字段可能不止一行(收件人、抄送、密件抄送、主题、附件)
文本文件经过预处理:用单个空格替换多个空格和制表符,替换前导和尾随空格。
我一整天都在做,不能让最后一部分工作。它确实适用于 [gskinner 正则表达式测试页面]:http ://regexr.com?36v27 ,但不适用于 php。
主题:
From: Black, Jack (LA)
Sent: Monday, October 28, 2013 6:36 PM
To: George, Jackson (London); DCS.CC.DARWIN (Australia)
Cc: Bar, Foo (Istanbul); Ex, Reg (Istanbul); Smith, John (Istanbul); Rambo,
John J. (Gaziantep); Matrix, John (Phuket)
Subject: RE: PREVENTIVE AND CORRECTIVE ACTIONS / FOOBAR
Dear George,
venenatis imperdiet quam. Proin a egestas nunc, et mattis elit. In hac habitasse platea dictumst. Nulla dolor nibh, tempus ut neque eu, tempus fermentum mauris. Mauris nec ipsum nec sapien commodo scelerisque ut eu urna. Pellentesque eu neque in enim adipiscing faucibus. Sed interdum arcu et sem mollis iaculis. Duis euismod laoreet ligula lacinia dapibus. Vestibulum ullamcorper malesuada metus at malesuada.
Nullam enim elit, auctor vehicula orci eget, imperdiet feugiat odio. Etiam dapibus sagittis sem a varius. Nulla sit amet convallis mi, sit amet rutrum ipsum. In libero lectus, mattis at dui eu.
Thank you and best regards,
Jack B. Black (Mr)
Operations Manager (GGD)
FU Supervisor (R34, R57)
Phone: +1112212212 (local 1111)
Mobile: +12 121.111.11.12
From: George, Jackson (UK)
Sent: Monday, October 28, 2013 5:57 PM
To: DCS.CC.DARWIN (Australia)
Bar, Foo (Istanbul); Ex, Reg (Istanbul); Smith, John (Istanbul); Rambo,
John J. (Gaziantep); Matrix, John (Phuket)
Subject: PREVENTIVE AND CORRECTIVE ACTIONS / FOOBAR
Dear Colleagues,
ermentum. Duis ipsum quam, bibendum a risus nec, tincidunt fringilla lectus. Nunc vel dictum massa, et cursus nunc. Mauris tincidunt felis eget justo congue volutpat. Nulla condimentum accumsan elementum. Integer commodo, lorem eu pharetra suscipit, ligula.
Best Regards.
SDFD srfgGD
Field coordinator (GGD)
Customer Representative
sds dfsd sdfgsef sdfsd
sgzdfgdfg fgfg gdfg
Footer text etc
sdfdfdf dfgsdfgsdfgsdfg
Phone : +90 212 368 40 00 (ext:3814)
正则表达式:
preg_match(
'/ # delimiter
( # capturing group start
[\ A-Z][a-z]+:.+\(.+\)\R # From: field
[A-Z][a-z]+:.+\R # Sent: fields
[A-Z][a-z]+:.+\R # To: field (1st line)
(?:.+\R)+ # any additional header lines, before blank line (To, CC, BCC, Subject, Attachments)
) # capturing group end
# delimiter + modifiers /x',$text_clean, $matches);
echo '<b>Matches: '.count($matches).'</b>';
print_r($matches);
我在获取额外的标题行时遇到问题:
(?:.+\R)+ # any additional header lines...
任何帮助表示赞赏