0

出于某种原因,在某些浏览器上,CP-1252 省略号 (0x85) 显示为 ů。我相信服务器声称该页面将采用 UTF-8(不要问我为什么 UTF-8 服务器服务于 CP-1252,这超出了范围)。我会理解抛出警告,因为它不是有效的 UTF-8。我会理解它显示为 Latin1 字符U+0085 NEXT LINE (NEL)。但我终其一生都无法弄清楚为什么它显示为U+016F LATIN SMALL LETTER U WITH ABOVE ABOVE

这就是我所看到的:

在此处输入图像描述

这是一个hexdump -C文件

00000000  78 78 78 78 78 78 78 78  78 78 78 78 78 78 78 78  |xxxxxxxxxxxxxxxx|
*
00000030  78 85 3c 2f 69 3e 0d 0a                           |x.</i>..|
00000038
4

1 回答 1

1

明目张胆的 mojibake案。曾几何时,我编写了一个小.bat脚本,显示(最知名的)OEM 和 ANSI 代码页到 Unicode 表的映射,反之亦然。这是代码的特定结果0x85

==> alts.bat 0x85
CP/ACP  Hex  Codepoint  #Description   :show8bit 133 <--> 0x85)
------  ---  ---------  ------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP437   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP737   0x85    0x0396  #GREEK CAPITAL LETTER ZETA
CP775   0x85    0x0123  #LATIN SMALL LETTER G WITH CEDILLA
CP850   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP852   0x85    0x016f  #LATIN SMALL LETTER U WITH RING ABOVE
CP855   0x85    0x0401  #CYRILLIC CAPITAL LETTER IO
CP857   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP860   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP861   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP862   0x85    0x05d5  #HEBREW LETTER VAV
CP863   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP864   0x85    0x2500  #FORMS LIGHT HORIZONTAL
CP865   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP866   0x85    0x0415  #CYRILLIC CAPITAL LETTER IE
CP869   0x85            #UNDEFINED
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x85            #DBCS LEAD BYTE
CP936   0x85            #DBCS LEAD BYTE
CP949   0x85            #DBCS LEAD BYTE
CP950   0x85            #DBCS LEAD BYTE

==>

对于代码点,反之亦然0x2026(抱歉,在非 Windows CP 行的情况下输出列移位错误):

==> alts.bat 0x2026
CP/ACP  Hex  Codepoint  #Description   :show16bit 8230 <--> 0x2026
------  ---  ---------  -------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x8163  0x2026  #HORIZONTAL ELLIPSIS
CP936   0xA1AD  0x2026  #HORIZONTAL ELLIPSIS
CP949   0xA1A6  0x2026  #HORIZONTAL ELLIPSIS
CP950   0xA14B  0x2026  #HORIZONTAL ELLIPSIS
macCYRILLIC_CP  0xC9    0x2026  #HORIZONTAL ELLIPSIS
macGREEK_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macICELAND_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS
macLATIN2_CP    0xC9    0x2026  #HORIZONTAL ELLIPSIS
macROMAN_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macTURKISH_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS

==>

进一步阅读:编码和代码页

于 2016-06-20T16:01:48.437 回答