问问题
2829 次
3 回答
4
Try with this function for word count:
// You can call the function as you like
if (!function_exists('mb_str_word_count'))
{
function mb_str_word_count($string, $format = 0, $charlist = '[]') {
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$words = mb_split('[^\x{0600}-\x{06FF}]', $string);
switch ($format) {
case 0:
return count($words);
break;
case 1:
case 2:
return $words;
break;
default:
return $words;
break;
}
};
}
echo mb_str_word_count("القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا") . PHP_EOL;
Resources
- Unicode list for arabic
- A Rule-Based Arabic Stemming Algorithm
- A Rule and Template Based Stemming Algorithm for Arabic Language (seems more complete)
Recommentations
- Use the tag
<meta charset="UTF-8"/>
in HTML files - Always add
Content-type: text/html; charset=utf-8
headers when serving pages
于 2012-12-14T18:31:25.180 回答
3
也接受 ASCII 字符:
if (!function_exists('mb_str_word_count'))
{
function mb_str_word_count($string, $format = 0, $charlist = '[]') {
$string=trim($string);
if(empty($string))
$words = array();
else
$words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
switch ($format) {
case 0:
return count($words);
break;
case 1:
case 2:
return $words;
break;
default:
return $words;
break;
}
}
}
于 2013-07-18T13:59:11.883 回答
1
前段时间我想计算一段的阅读时间并且遇到了同样的问题,我只是简单地计算了段落中的空格:)(请注意,它不会那么准确,但它适合我)
像这样:
substr_count($text, ' ') + 1;
于 2021-11-28T19:55:39.743 回答