如果您真的想确保使用蹩脚浏览器发送文本的用户不会破坏您的数据主干,您还可以使用HEBCI:基于 HTML 实体的代码页推理技术。
本质上,它是这样工作的:
每个代码页都有自己的指纹。例如单个实体“º” 可用于区分三巨头:ISO-8859-1/Windows-1252 (=BA)、MacRoman(=BC) 和 UTF-8 (=C2BA)。
在表单中,您只需添加一个包含这些指纹作为实体的隐藏输入(如 °、÷ 和 —),当用户提交表单时,您只需检查返回的十六进制值并将它们与您的指纹表进行比较. 如果这不匹配,则只有 THEN 继续其他后备解决方案。
稍微大一点的实现只需要五个代码点就可以很好地工作:
my @fp_ents = qw/deg divide mdash bdquo euro/;
my %fingerprints = (
"UTF-8" => ['c2b0','c3b7','e28094','e2809e','e282ac'],
"WINDOWS-1252" => ['b0','f7','97','84','80'],
"MAC" => ['a1','d6','d1','e3','db'],
"MS-HEBR" => ['b0','ba','97','84','80'],
"MAC-CYRILLIC" => ['a1','d6','d1','d7',''],
"MS-GREEK" => ['b0','','97','84','80'],
"MAC-IS" => ['a1','d6','d0','e3',''],
"MS-CYRL" => ['b0','','97','84','88'],
"MS932" => ['818b','8180','815c','',''],
"WINDOWS-31J" => ['818b','8180','815c','',''],
"WINDOWS-936" => ['a1e3','a1c2','a1aa','',''],
"MS_KANJI" => ['818b','8180','','',''],
"ISO-8859-15" => ['b0','f7','','','a4'],
"ISO-8859-1" => ['b0','f7','','',''],
"CSIBM864" => ['80','dd','','',''],
);