php - str_word_count() 函数无法正确显示阿拉伯语

Question

score 4 · Accepted Answer

Try with this function for word count:

// You can call the function as you like
if (!function_exists('mb_str_word_count'))
{
    function mb_str_word_count($string, $format = 0, $charlist = '[]') {
        mb_internal_encoding( 'UTF-8');
        mb_regex_encoding( 'UTF-8');

        $words = mb_split('[^\x{0600}-\x{06FF}]', $string);
        switch ($format) {
            case 0:
                return count($words);
                break;
            case 1:
            case 2:
                return $words;
                break;
            default:
                return $words;
                break;
        }
    };
}



echo mb_str_word_count("القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا") . PHP_EOL;

Resources

Unicode list for arabic
A Rule-Based Arabic Stemming Algorithm
A Rule and Template Based Stemming Algorithm for Arabic Language (seems more complete)

Recommentations

Use the tag <meta charset="UTF-8"/> in HTML files
Always add Content-type: text/html; charset=utf-8 headers when serving pages

score 3 · Accepted Answer

也接受 ASCII 字符：

if (!function_exists('mb_str_word_count'))
{
    function mb_str_word_count($string, $format = 0, $charlist = '[]') {
        $string=trim($string);
        if(empty($string))
            $words = array();
        else
            $words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
        switch ($format) {
            case 0:
                return count($words);
                break;
            case 1:
            case 2:
                return $words;
                break;
            default:
                return $words;
                break;
        }
    }
}

score 1 · Accepted Answer

前段时间我想计算一段的阅读时间并且遇到了同样的问题，我只是简单地计算了段落中的空格:)（请注意，它不会那么准确，但它适合我）

像这样：

substr_count($text, ' ') + 1;

php - str_word_count() 函数无法正确显示阿拉伯语

3 回答 3

Resources

Recommentations

Related

Reference