我想解析文本。有一个奇怪的句子,就像B R I E F I N G S I N B I O I N F O R M A T I C S我想跳过那个句子一样。这是代码

$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       if (count(strlen($words)>1)){
        //I don't know what to do

但是,它仍然是错误的,如何识别模式句B R I E F I N G S I N B I O I N F O R M A T I C S?谢谢你


3 回答 3


那这个呢?如果句子中所有单词的长度等于 1,则此方法有效。

    $text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       $isStrange = true;
       if (strlen($words)>1){
        $isStrange = false;
    if ($isStrange) echo $sentence.' is very strange!';
于 2012-11-01T09:51:10.787 回答


$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$text = str_replace("B R I E F I N G S I N B I O I N F O R M A T I C S. ","",$text); // <--- added this

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       if (count(strlen($words)>1)){
        //I don't know what to do
于 2012-11-01T09:51:43.297 回答


echo preg_replace('/^[A-Z](?:\s[A-Z])+\./', '', $text);
于 2012-11-01T10:00:21.450 回答