8

我想检查是否是 PHP 上的字符串类型多字节。知道如何做到这一点吗?

例子:

<?php!
$string = "I dont have idea that is what i am...";
if( is_multibyte( $string ) )
{
    echo 'yes!!';
}else{
    echo 'ups!';
}
?>

也许(规则 8 个字节):

<?php
if( mb_strlen( $string ) > strlen() )
{
    return true;
}
else
{
    return false;
}
?>

我读到: 可变宽度编码 - WIKIUTF-8 - WIKI

4

3 回答 3

9

有两种解释。首先是每个字符都是多字节的。二是字符串至少包含一个多字节字符。如果您对处理无效字节序列感兴趣,请参阅https://stackoverflow.com/a/13695364/531320了解详细信息。

function is_all_multibyte($string)
{
    // check if the string doesn't contain invalid byte sequence
    if (mb_check_encoding($string, 'UTF-8') === false) return false;

    $length = mb_strlen($string, 'UTF-8');

    for ($i = 0; $i < $length; $i += 1) {

        $char = mb_substr($string, $i, 1, 'UTF-8');

        // check if the string doesn't contain single character
        if (mb_check_encoding($char, 'ASCII')) {

            return false;

        }

    }

    return true;

}

function contains_any_multibyte($string)
{
    return !mb_check_encoding($string, 'ASCII') && mb_check_encoding($string, 'UTF-8');
}

$data = ['東京', 'Tokyo', '東京(Tokyo)'];

var_dump(
    [true, false, false] ===
    array_map(function($v) {
        return is_all_multibyte($v);
    },
    $data),
    [true, false, true] ===
    array_map(function($v) {
        return contains_any_multibyte($v);
    },
    $data)
);
于 2013-05-31T08:55:32.640 回答
8

我不确定是否有更好的方法,但想到的一种快速方法是:

if (mb_strlen($str) != strlen($str)) {
    echo "yes";
} else {
    echo "no";
}
于 2013-05-29T18:45:06.980 回答
2

To determine if something is multibyte or not you need to be specific about which character set you're using. If your character set is Latin1, for example, no strings will be multibyte. If your character set is UTF-16, every string is multibyte.

That said, if you only care about a specific character set, say utf-8, you can use a mb_strlen < strlen test if you specify the encoding parameter explicitly.

function is_multibyte($s) {
  return mb_strlen($s,'utf-8') < strlen($s);
}
于 2013-05-29T19:43:50.193 回答