-1

我有这段文字:

男士夹克是绿色的。他——现代史上最大的明星——骑自行车的速度非常快(每小时 230 公里)。这怎么可能?!他用的是什么自行车?他的自行车的半自动齿轮非常昂贵,对达到这个速度有很大帮助。一些(或者可能很多)声称他是世界上最快的!“我看见他骑自行车了!” John Deer 先生发言。“他设定的速度是每小时133.78公里”,听起来不可思议;听起来很骗人。

我想要以下结果数组:

words[1] = "A"
words[2] = "man's"
words[3] = "jacket"
...
words[n+1] = "color"
words[n+2] = "."
words[n+3] = "He"
words[n+4] = "-"
words[n+5] = "the"
...

该数组应分别包含所有单词和标点符号。可以使用正则表达式执行吗?任何人都可以帮助编写它吗?谢谢!

编辑:根据要求展示我的作品。我正在使用以下函数处理文本,但我想在正则表达式中做同样的事情:

$text = explode(' ', $this->rawText);
$marks = Array('.', ',', ' ?', '!', ':', ';', '-', '--', '...');
for ($i = 0, $j = 0; $i < sizeof($text); $i++, $j++) {
    $skip = false;
    //check if the word contains punctuation mark
    foreach ($marks as $value) {
        $markPosition = strpos($text[$i], $value);
        //if contains separate punctation mark from the word
        if ($markPosition !== FALSE) {
            //check position of punctation mark - if it's 0 then probably it's punctuation mark by itself like for example dash
            if ($markPosition === 0) {
                //add separate mark to array
                $words[$j] = new Word($j, $text[$i], 2, $this->phpMorphy);
            } else {
                $words[$j] = new Word($j, substr($text[$i], 0, strlen($text[$i]) - 1), 0, $this->phpMorphy);
                //add separate mark to array
                $punctMark = substr($text[$i], -1);
                $j += 1;
                $words[$j] = new Word($j, $punctMark, 1, $this->phpMorphy);
            }
            $skip = true;
            break;
        }
    }
    if (!$skip) {
        $words[$j] = new Word($j, $text[$i], 0, $this->phpMorphy);
    }
}
4

2 回答 2

1

以下内容将根据您的具体文本进行拆分。

$words = preg_split('/(?<=\s)|(?<=\w)(?=[.,:;!?()-])|(?<=[.,!()?\x{201C}])(?=[^ ])/u', $text);

working demo

于 2013-11-04T14:26:31.360 回答
0

尝试利用preg_split. 在方括号内传递您的标点符号(您选择的[)和]

<?php
$str="A man’s jacket is of green color. He – the biggest star in modern history – rides bikes very fast (230 km per hour). How is it possible?! What kind of bike is he using? The semi-automatic gear of his bike, which is quite expensive, significantly helps to reach that speed. Some (or maybe many) claim that he is the fastest in the world! “I saw him ride the bike!” Mr. John Deer speaks. “The speed he sets is 133.78 kilometers per hour,” which sounds incredible; sounds deceiving.";

$keywords=preg_split("/[-,. ]/", $str);

print_r($keywords);

输出:

数组 ( [0] => A [1] => 男士 [2] => 夹克 [3] => 是 [4] => [5] => 绿色 [6] => 颜色 [7] => [ 8] => 他 [9] => – [10] => [11] => 最大 [12] => 明星 [13] => 在 [14] => 现代 [15] => 历史 [16] => –</p>

消息被截断以防止滥用资源... Shankar ;)

于 2013-11-04T13:02:00.680 回答