0

我正在尝试在 pipe|和 dot之间找到句子.,例如

| 这是一。这是两个。

我使用的正则表达式模式:

preg_match_all('/(:\s|\|+)(.*?)(\.|!|\?)/s', $file0, $matches);

到目前为止,我无法捕捉到这两个句子。我使用的正则表达式仅捕获第一句话。

我怎么解决这个问题?

编辑:从正则表达式中可以看出,我试图找到句子 BETWEEN (: or |) AND (. or ! or ?)

列或管道表示句子的起点。句子可能是:

: Sentence one. Sentence two. Sentence three. 
| Sentence one. Sentence two? 
| Sentence one. Sentence two! Sentence three?
4

4 回答 4

1

我会保持简单,只需匹配:

\s*[^.|]+\s*

这表示匹配任何不包含管道或句号的内容,并且它还会在每个句子之前/之后修剪可选的空格。

$input = "| This is one. This is two.";
preg_match_all('/\s*[^.|]+\s*/s', $input, $matches);
print_r($matches[0]);

这打印:

Array
(
    [0] =>  This is one
    [1] =>  This is two
)
于 2020-01-18T10:20:18.243 回答
1

这可以完成工作:

$str = '| This is one. This is two.';
preg_match_all('/(?:\s|\|)+(.*?)(?=[.!?])/', $str, $m);
print_r($m)

输出:

Array
(
    [0] => Array
        (
            [0] => | This is one
            [1] =>  This is two
        )

    [1] => Array
        (
            [0] => This is one
            [1] => This is two
        )

)

演示和解释

于 2020-01-18T10:20:42.787 回答
1

另一种选择是利用\G迭代匹配来断言前一个匹配结束时的位置,并捕获捕获组中匹配点和 0+ 水平空白字符之后的值。

(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*

在零件

  • (?:非捕获组
    • \|\h*匹配|和 0+ 个水平空白字符
    • |或者
    • \G(?!^)在上一场比赛结束时断言位置
  • )关闭组
  • (捕获组 1 匹配除换行符- [^.\r\n]+以外的任何字符 1 次以上.
  • )关闭组
  • \.\h*匹配 1 个.和 0+ 个水平空白字符

正则表达式演示| php演示

例如

$re = '/(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*/';
$str = '| This is one. This is two.
John loves Mary.| This is one. This is two.';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);

输出

Array
(
    [0] => Array
        (
            [0] => | This is one. 
            [1] => This is one
        )

    [1] => Array
        (
            [0] => This is two
            [1] => This is tw
        )

)
于 2020-01-18T12:11:25.417 回答
0

为简单起见,找到 and 之间的所有内容|.然后拆分:

$input = "John loves Mary. | This is one. This is two. | Sentence 1. Sentence 2.";
preg_match_all('/\|\s*([^|]+)\./', $input, $matches);
if ($matches) {
    foreach($matches[1] as $match) {
        print_r(preg_split('/\.\s*/', $match));
    }
}

印刷:

Array
(
    [0] => This is one
    [1] => This is two
)
Array
(
    [0] => Sentence 1
    [1] => Sentence 2
)
于 2020-01-18T12:17:35.477 回答