0

我有以下字符串:

Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL

我需要提取Beyonce Knowles, Jay-Z, KANYE WEST,West Palm Beach, FLSan Antonio Texas(分离的)

我还是正则表达式的新手,但到目前为止我已经有了'/^[A-Z]+/

如何修复我的正则表达式以解释我试图获取的提取词?

谢谢

4

1 回答 1

1

你可以试试这个:

/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u

这将匹配一个或多个大写字母后跟零个或多个小写字母,可能重复多次,由一个或多个空格或标点字符分隔。它利用了Unicode 字符类,因此它可以处理其他语言的文本。

或者这样可以连续匹配两个这样的模式:

/\p{Lu}+\p{L}*[\s\p{P}]+\p{Lu}+\p{L}*/u

例如:

$input = 'Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL';
$pattern = '/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u';
preg_match_all($pattern, $input, $output_array);

产生数组:

Array
(
    [0] => Array 
        (
            [0] => Beyonce Knowles
            [1] => Jay-Z
            [2] => KANYE WEST
            [3] => San Antonio Texas
            [4] => West Palm Beach, FL
        )
)
于 2013-08-17T20:26:48.877 回答