2

使用 PHP,我希望从包含编号列表的字符串中提取一个数组。

示例字符串:

The main points are: 1. This is point one. 2. This is point two. 3. This is point three.

将产生以下数组:

[0] => 1. This is point one.
[1] => 2. This is point two.
[2] => 3. This is point three.

字符串的格式可以变化 - 例如:

1. This is point one, 2. This is point two, 3. This is point three.
1) This is point one  2) This is point two 3) This is point three
1 This is point one. 2 This is point two. 3 This is point three.

我已经开始使用具有以下模式的preg_match_all :

!((\d+)(\s+)?(\.?)(\)?)(-?)(\s+?)(\w+))!

但我不确定如何匹配其余字符串/直到下一场比赛。

RegExr提供的示例

4

2 回答 2

4

如果您的输入遵循您的示例输入,因为每个“点”本身不包含数字,您可以使用以下正则表达式:

\d+[^\d]*

在 PHP 中,您可以preg_match_all()用来捕获所有内容:

$text = 'The main points are: 1. This is point one. 2. This is point two. 3. This is point three.';

$matches = array();
preg_match_all('/(\d+[^\d]*)/', $text, $matches);

print_r($matches[1]);

这将导致:

Array
(
    [0] => 1. This is point one.
    [1] => 2. This is point two.
    [2] => 3. This is point three.
)

不过,如果实际点本身有任何数字/数字 - 这将不起作用。

如果您希望实际数字出现在每个点中,则需要定义每个点的实际“锚点”或“结束”,例如句点。如果您可以声明 a.仅出现在该点的末尾(忽略前导数字后面的潜在数字),您可以使用以下正则表达式:

\d+[.)\s][^.]*\.

preg_match_all()它可以很容易地从上面放入:

preg_match_all('/(\d+[.)\s][^.]*\.)/', $text, $matches);

正则表达式解释:

\d+        # leading number
[.)\s]     # followed by a `.`, `)`, or whitespace
[^.]*      # any non-`.` character(s)
\.         # ending `.`

第二个正则表达式的警告是 a.可能只出现在每个点的末尾(并且在前导数字之后)。但是,我认为这条规则可能比“没有数字”规则更容易遵循——但这完全取决于您的实际输入。

于 2012-11-06T06:00:28.830 回答
0

使用preg_split,它会更容易,只需根据您的编号格式拆分字符串,并返回非空结果。修改它以满足您的需要:

http://codepad.org/tK6fGCRB

<?php

$theReg = '/\d\.|\d\)|\d /';
$theStrs = array(
                '1. This is point one, 2. This is point two, 3. This is point3' ,
                '1) This is point one  2) This is point two 3) This is point 3' ,
                '1 This is point one. 3 This is point three. 4 This is point 4'
                );

foreach($theStrs as $str)
   print_r(preg_split($theReg, $str , -1 , PREG_SPLIT_NO_EMPTY));;
?>
于 2012-11-06T06:03:23.460 回答