2

I've read Wikipedia's article on Windows-1252 character encoding. For characters whose byte value is < 128, it should be the same as ASCII/UTF-8.

This makes sense:

php -r "var_export(mb_detect_encoding(\"\x92\", 'windows-1252', true));" 'Windows-1252'

A left curly apostrophe is detected properly.

php -r "var_export(mb_detect_encoding(\"a\", 'windows-1252', true));" false

Huh? The letter "a" isn't Windows-1252?

My terminal, where I"m running this, is set to UTF-8. So that should be the same byte sequence as ASCII for the letter 'a'. For the sake of minimizing the variables, if I specify the right Windows-1252 byte sequence:

php -r "var_export(mb_detect_encoding(\"\x61\", 'windows-1252', true));" false

Changing the "strict" parameter (which has pretty useless documentation) does nothing in these cases.

4

1 回答 1

4

不支持编码检测windows-1252。根据mb_detect_order文档:

mbstring 当前实现了以下编码检测过滤器。如果以下编码存在无效的字节序列,编码检测将失败。

UTF-8、UTF-7、ASCII、EUC-JP、SJIS、eucJP-win、SJIS-win、JIS、ISO-2022-JP

对于 ISO-8859- , mbstring 始终检测为 ISO-8859-

对于 UTF-16、UTF-32、UCS2 和 UCS4,编码检测将始终失败。

于 2014-03-02T10:31:41.637 回答