0

我想用 PHP 中的 preg_match_all 在他们自己的组中捕获每一个:

  1. 章、节或页
  2. 指定章、节或页的编号(或字母,如果有的话)。如果它们之间有一个空格,则应考虑到
  3. 单词“和”、“或”

请记住,我想忽略所有书名,并且字符串中的项目数可能是动态的,正则表达式应该适用于以下所有示例:

  1. Ch1 和 Sect2b
  2. Ch 4 x 不需要的标题和 Sect 5y 不需要的标题和 Sect6 z 和 Ch7 或 Ch8

到目前为止,这是我设法提出的:

    $str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
    preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches);

    Array
    (
        [0] => Array
            (
                [0] => Pg3
            )

        [1] => Array
            (
                [0] => Pg
            )

        [2] => Array
            (
                [0] => 3
            )

        [3] => Array
            (
                [0] => 
            )

        [4] => Array
            (
                [0] => 
            )

    )

预期的结果应该是:

    Array
    (
        [0] => Array
            (
                [0] => Ch 1 a and 
                [1] => Sect 2b and 
                [2] => Pg3
            )

        [1] => Array
            (
                [0] => Ch
                [1] => Sect
                [2] => Pg
            )

        [2] => Array
            (
                [0] => 1
                [1] => 2
                [2] => 3
            )

        [3] => Array
            (
                [0] => a
                [1] => b
                [2] => 
            )

        [4] => Array
            (
                [0] => and
                [1] => and
                [2] => 
            )

    )
4

2 回答 2

0

我就是这样做的。

$arr = array(
    'Ch1 and Sect2b',
    'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3',
    'Ch 4 x unwantedtitle and Sect 5y unwanted title and' .
        ' Sect6 z and Ch7 or Ch8a',
    'Assume this is ch1a and ch 2 or ch seCt 5c.' .
        ' Then SECT or chA pg22a and pg 13 andor'
);

foreach ($arr as $a) {
    var_dump($a);
    preg_match_all(
    '~
        \b(?P<word>ch|sect|(pg))
        \s*(?P<number>\d+)
        (?(2)\b|
            \s*
            (?P<letter>(?!(?<=\s)(?:and|or)\b)[a-z]+)?
            \s*
            (?:(?<=\s)(?P<cond>and|or)\b)?
        )
    ~xi'
    ,$a,$m);
    foreach ($m as $k => $v) {
        if (is_numeric($k) && $k !== 0) unset($m[$k]);
        // this is for 'beautifying' the result array
        // note that $m[0] will still return whole matches
    }
    print_r($m);
}

我不得不pg变成一个捕获组,因为我需要为此明确编写一个条件,也就是说,它可以附加一个数字(中间有或没有空格),但考虑到页面指示符不会,它不能附加任何字母有一个像“pg23a”这样的字母。

这就是为什么我选择命名每个组并通过代码中的内部 foreach 循环“美化”结果。否则,如果您选择使用数字索引(而不是命名索引),您将需要跳过每个$m[2].

为了显示一个例子,这里是最后一项的输出$arr

Array
(
    [0] => Array
        (
            [0] => ch1a and
            [1] => ch 2 or
            [2] => seCt 5c
            [3] => pg 13
        )

    [word] => Array
        (
            [0] => ch
            [1] => ch
            [2] => seCt
            [3] => pg
        )

    [number] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 5
            [3] => 13
        )

    [letter] => Array
        (
            [0] => a
            [1] => 
            [2] => c
            [3] => 
        )

    [cond] => Array
        (
            [0] => and
            [1] => or
            [2] => 
            [3] => 
        )

)
于 2013-01-14T00:07:26.203 回答
0

这是我能得到的最接近的:

$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3';
preg_match_all ('/((Ch|Sect|Pg)\s?(\d+)\s?(\w?))(.*?(and|or))?/i', $str, $matches);


Array
(
    [0] => Array
        (
            [0] => Ch 1 a unwantedtitle and
            [1] => Sect 2b unwanted title and
            [2] => Pg3
        )

    [1] => Array
        (
            [0] => Ch 1 a
            [1] => Sect 2b
            [2] => Pg3
        )

    [2] => Array
        (
            [0] => Ch
            [1] => Sect
            [2] => Pg
        )

    [3] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

    [4] => Array
        (
            [0] => a
            [1] => b
            [2] => 
        )

    [5] => Array
        (
            [0] =>  unwantedtitle and
            [1] =>  unwanted title and
            [2] => 
        )

    [6] => Array
        (
            [0] => and
            [1] => and
            [2] => 
        )

)
于 2013-01-13T19:23:57.400 回答