在 Casimir et Hippolyte 的帮助下,我一直在尝试解析一些文本,如下例所示(注意:我的原始问题过度简化了示例文本 - 因为我认为我很容易能够将提供的任何解决方案调整为实际文本。然而,在用手指敲击钥匙和敲击墙壁之后,我仍然没有更聪明)。
这是我到目前为止所拥有的......我已经尝试转义数据(addslashes),但我认为我会以原始(er)形式发布$subject......
<?php
$subject = "
Ydqk‚_,¦#¦#À%¦#¦#¦#¦#¦#èeèe2%Ž¦#¦#¦#Cf¦#¦#¦#¦#qk¦#¦#¦#¦#¦#¦#¦#¦#¦#Ð ð:SOME COMPANY<br />
WITH A LONG NAME<br />
The Big Barn, 23 London Lane, Cheltenham, Glos. GL1 1GL<br />
Tel. 022234 567890 Fax. 02234 345678 Email. <a href= mailto:info@some.co.uk </a>info@some.co.uk<br />
Company: Another Company (AKA) – 22 London Lane, Cheltenham, GL1 2GL<br />
FAO: Mr D. Mistify/ A. Clarity/ Jo Bloggs<br />
PROJECT OMAHA <br />
<br />
<br />
CONTRACT No. 14 DATE 10/6/13 <br />
No. QUESTION ANSWER <br />
<br />
973 <br />
Hi, it's me again:<br />
I'm very, very confused. Why do regular expressions seem such a dark art?<br />
Surely it can't be as hard as I manage to make it seem?<br />
Please advise<br />
Thank you. <br />
Date Required – <br />
17/6/13 <br />
<br />
Signed for and on behalf of Some Company with a Long Name Limited<br />
Me Again – Senior Moment<br />
________________________________________________________<br />
<br />
<br />
<br />
<br />
<br />
<br />
QUESTION / ANSWER SHEET<br />
Some Company with a Long Name<br />
<br />
Question and Answer System<br />
AA414<br />
’“¸¹ÉÊËÌÔ...descends into gibberish...
";
$pattern = '~
Project\hNo\.\h\d++\hDATE\h
(?<date>\d{1,2}\/\d{1,2}\/\d{1,2})
\s++No\.\hQUESTION\hANSWER\s++
(?<No>\d++)\s++
# all characters but D or D not followed by "ate Required"
(?<desc>(?>[^D]++|D(?!ate\hRequired))+)
\D++
(?<date_required>\d{1,2}\/\d{1,2}\/\d{1,2})
~x';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
print_r($matches);
?>
我想提取以下内容:
- 发布日期 (10/6/2013) (dd/mm/yyyy)
- 问题编号 (973)
- 说明
- 所需日期 (17/6/2013) (dd/mm/yyyy)