我想用 PHP 中的 preg_match_all 在他们自己的组中捕获每一个:
- 章、节或页
- 指定章、节或页的编号(或字母,如果有的话)。如果它们之间有一个空格,则应考虑到
- 单词“和”、“或”
请记住,我想忽略所有书名,并且字符串中的项目数可能是动态的,正则表达式应该适用于以下所有示例:
- Ch1 和 Sect2b
- Ch 4 x 不需要的标题和 Sect 5y 不需要的标题和 Sect6 z 和 Ch7 或 Ch8
到目前为止,这是我设法提出的:
$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches);
Array
(
[0] => Array
(
[0] => Pg3
)
[1] => Array
(
[0] => Pg
)
[2] => Array
(
[0] => 3
)
[3] => Array
(
[0] =>
)
[4] => Array
(
[0] =>
)
)
预期的结果应该是:
Array
(
[0] => Array
(
[0] => Ch 1 a and
[1] => Sect 2b and
[2] => Pg3
)
[1] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[2] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[3] => Array
(
[0] => a
[1] => b
[2] =>
)
[4] => Array
(
[0] => and
[1] => and
[2] =>
)
)