php - Unexpected result from mb_detect_encoding with Windows-1252

Question

I've read Wikipedia's article on Windows-1252 character encoding. For characters whose byte value is < 128, it should be the same as ASCII/UTF-8.

This makes sense:

php -r "var_export(mb_detect_encoding(\"\x92\", 'windows-1252', true));" 'Windows-1252'

A left curly apostrophe is detected properly.

php -r "var_export(mb_detect_encoding(\"a\", 'windows-1252', true));" false

Huh? The letter "a" isn't Windows-1252?

My terminal, where I"m running this, is set to UTF-8. So that should be the same byte sequence as ASCII for the letter 'a'. For the sake of minimizing the variables, if I specify the right Windows-1252 byte sequence:

php -r "var_export(mb_detect_encoding(\"\x61\", 'windows-1252', true));" false

Changing the "strict" parameter (which has pretty useless documentation) does nothing in these cases.

score 4 · Accepted Answer

不支持编码检测windows-1252。根据mb_detect_order文档：

mbstring 当前实现了以下编码检测过滤器。如果以下编码存在无效的字节序列，编码检测将失败。

UTF-8、UTF-7、ASCII、EUC-JP、SJIS、eucJP-win、SJIS-win、JIS、ISO-2022-JP

对于 ISO-8859- ， mbstring 始终检测为 ISO-8859-。

对于 UTF-16、UTF-32、UCS2 和 UCS4，编码检测将始终失败。

php - Unexpected result from mb_detect_encoding with Windows-1252

1 回答 1

Related

Reference