5

Have problem with preg split and utf. This is code:

$original['words'] = preg_split("/[\s]+/", $original['text']);
print_r($original);

This is answer:

Array
(

    [text] => Šios baterijos kaista
    [words] => Array
        (
            [0] => �
            [1] => ios
            [2] => baterijos
            [3] => kaista

This code is runing in CakePHP framework. Make a notice that [text] is showed correctly before words and is messed in split progress. By the way, I tried using these one:

mb_internal_encoding( 'UTF-8'); 
mb_regex_encoding( 'UTF-8');  
ini_set('default_charset','utf-8');

None helped. Thank you.

4

2 回答 2

18

您需要preg_split通过将u 修饰符添加到正则表达式来启用 utf-8 模式:

preg_split("/[\s]+/u", $original['text']);

您在尝试找到解决方案时提到的配置指令在这里不起作用。

于 2013-02-28T14:17:00.063 回答
0
$original = mb_split("[\s]+", 'Šios baterijos kaista');
print_r($original);

结果:

Array
(
    [0] => Šios
    [1] => baterijos
    [2] => kaista
)

笔记:

1) 使用mb_split时,不要忘记从正则表达式模式中删除前导和尾随 '/' 。

2) 仅在启用mbstring扩展时有效。

于 2019-04-09T15:05:54.307 回答