我有这个函数,它利用 preg_replace_callback 将句子拆分为属于不同类别(字母、汉字、其他所有字符)的块的“链”。
该函数还试图将字符 '、 {和}包括为“字母”
function String_SplitSentence($string)
{
$res = array();
preg_replace_callback("~\b(?<han>\p{Han}+)\b|\b(?<alpha>[a-zA-Z0-9{}']+)\b|(?<other>[^\p{Han}A-Za-z0-9\s]+)~su",
function($m) use (&$res)
{
if (!empty($m["han"]))
{
$t = array("type" => "han", "text" => $m["han"]);
array_push($res,$t);
}
else if (!empty($m["alpha"]))
{
$t = array("type" => "alpha", "text" => $m["alpha"]);
array_push($res, $t);
}
else if (!empty($m["other"]))
{
$t = array("type" => "other", "text" => $m["other"]);
array_push($res, $t);
}
},
$string);
return $res;
}
但是,花括号似乎有问题。
print_r(String_SplitSentence("Many cats{1}, several rats{2}"));
从输出中可以看出,该函数将 { 视为字母字符,如所示,但在 } 处停止并将其视为“其他”。
Array
(
[0] => Array
(
[type] => alpha
[text] => Many
)
[1] => Array
(
[type] => alpha
[text] => cats{1
)
[2] => Array
(
[type] => other
[text] => },
)
[3] => Array
(
[type] => alpha
[text] => several
)
[4] => Array
(
[type] => alpha
[text] => rats{2
)
[5] => Array
(
[type] => other
[text] => }
)
我究竟做错了什么?