我有一个包含多行用逗号分隔的中文单词的文件,,
如下所示:
你,我,他,好,但,中,国,龙
好,把,是,的,啊,人,吖,哦
我想使用以下代码将它们加载到数组中,稍后我可以使用该数组来查找文章中包含的中文单词:
$ds = file($Dictionary);
$_SP_ = chr(0xFF).chr(0xFE);
$array = array();
foreach($ds as $d)
{
$spstr = _SP_;//
$spstr = iconv(ucs-2be, 'utf-8', $spstr);
$ws = explode(',', $d);//array of single Chinese word
$wall = iconv('utf-8', ucs-2be, join($spstr, $ws));//what is $wall used for?
$ws = explode(_SP_, $wall);
foreach($ws as $estr)
{
$array[$estr] = strlen($estr);
}
}
我的问题:
什么
$_SP_ = chr(0xFF).chr(0xFE) mean?chr(0xFF).chr(0xFE)
是从 ASCII 的最后两个字符中检索的字符串,这两者的组合是为了什么?为什么我应该将ucs-2b的SP转换为 utf-8 格式?
为什么
$ws
再次被转换为字符串但由chr(0xFF).chr(0xFE)
utf-8 类型分隔。为什么它需要每个单词的长度?
为什么
$spstr
是UCS-2be类型,只因为它是的组合chr(0xFF).chr(0xFE)
?