4

我想将段落字符串拆分为句子数组。当然,我使用带有字符点 (.) 的正则表达式将段落拆分为句子。问题是句子中的学术名称缩写,每个缩写都使用点(.)。所以我的正则表达式完全错误地分割段落。

这是段落的示例:

同时茂物农业大学校长Herry Suhardiyanto教授在致辞中要求研究生继续学习,按时完成学业。出席一般观众的有茂物农业大学研究生院副院长德迪·朱萨迪博士,茂物农业大学研究生院博士生项目秘书,Prof.Dr.Dr. 马里明。

仅使用点 (.) 作为正则表达式,我得到:

Array (
[0] => Meanwhile Rector of Bogor Agricultural University, Prof
[1] => Dr
[2] => Herry Suhardiyanto, in his remarks requested that the graduate students should keep on studying and will finalize their studies on time
[3] => ...
)

这实际上是我想要的:

Array (
[0] => Meanwhile Rector of Bogor Agricultural University, Prof. Dr. Herry Suhardiyanto, in his remarks requested that the graduate students should keep on studying and will finalize their studies on time
[1] => Present in  that general audience were  the Deputy Dean of the Graduate School of Bogor Agricultural University, Dr.Dedi Jusadi, Secretary of the Graduate School for Doctoral Program of Bogor Agricultural University, Prof.Dr. Marimin
)
4

2 回答 2

3

您可以使用负面的 Lookbehinds:

((?<!Prof)(?<!Dr)(?<!Mr)(?<!Mrs)(?<!Ms))\.如果需要,添加更多

在这里解释演示:http ://regex101.com/r/xQ3xF9

代码可能如下所示:

$text="Meanwhile Rector of Bogor Agricultural University, Prof. Dr. Herry Suhardiyanto, in his remarks about Mr. John requested that the graduate students should keep on studying and will finalize their studies on time. Present in that general audience were Mrs. Peterson of the Graduate School of Bogor Agricultural University, Dr.Dedi Jusadi, Secretary of the Graduate School for Doctoral Program of Bogor Agricultural University, Prof.Dr. Marimin.";

$titles=array('(?<!Prof)', '(?<!Dr)', '(?<!Mr)', '(?<!Mrs)', '(?<!Ms)');
$sentences=preg_split('/('.implode('',$titles).')\./',$text);
print_r($sentences);
于 2013-03-09T04:51:03.403 回答
1

这似乎可行,但与严格的 RegEx 相比,这是一个新的 PHP 函数 -

$begin = array( 0=>'Meanwhile in geography,',
            1=>'Dr',
            2=>'Henry Suhardiyanto, in his remarks, stated that ',
            3=>'Dr',
            4=>'Prof',
            5=>'Jedi Dusadi was another ',
            6=>'Prof');

$exclusions = array("Dr", "Prof", "Mr", "Mrs");

foreach ($begin as $pos => $sentence) {
if (in_array($sentence, $exclusions)) {
    $begin[$pos+1] = $sentence . ". " . $begin[$pos+1];
    unset($begin[$pos]);
    array_values($begin);
    }
}    
于 2013-03-09T05:05:57.347 回答