3

是否有 String.toLowerCase() 和 String.toUpperCase() 的 JavaScript polyfill 实现,或者 JavaScript 中可以处理 Unicode 字符并在浏览器之间保持一致的其他方法?

背景资料

执行以下操作将在浏览器中产生不同的结果,甚至在浏览器版本之间(例如 FireFox 54 与 55):

document.write(String.fromCodePoint(223).normalize("NFKC").toLowerCase().toUpperCase().toLowerCase())

在 Firefox 55 中它为您提供ss,在 Firefox 54 中它为您提供ß

通常这很好,诸如 Locales 之类的机制可以处理很多您想要的情况;等 BaaS 系统通信)时,它可以大大简化您在客户端处理内部数据的交互。

4

1 回答 1

3

请注意,这个问题似乎只影响过时的 Firefox 版本,因此除非您明确需要支持这些旧版本,否则您可以选择根本不打扰。您的示例的行为在所有现代浏览器中都是相同的(因为 Firefox 发生了变化)。这可以使用 jsvu + eshost进行验证:

$ jsvu # Update installed JavaScript engine binaries to the latest version.

$ eshost -e '"\xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss

#### V8 --harmony
ss

#### JavaScriptCore
ss

#### V8
ss

#### SpiderMonkey
ss

#### xs
ss

但是您问如何解决这个问题,所以让我们继续。

https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase的第 4 步状态:

根据 Unicode 默认大小写转换算法,让我们cuList成为一个列表,其中元素是 的结果。toLowercase(cpList)

Unicode 默认大小写转换算法在 Unicode 标准的第 3.13 节默认大小写算法中指定。

Unicode 字符的完整大小写映射是通过使用 from 的映射SpecialCasing.txt加上 from 的映射获得的UnicodeData.txt,不包括后面的任何会冲突的映射。在这些文件中没有映射的任何字符都被认为映射到自身。

[…]

以下规则指定 Unicode 字符串的默认大小写转换操作。这些规则使用完整的大小写转换操作、、、Uppercase_Mapping(C)Lowercase_Mapping(C)Titlecase_Mapping(C)以及基于大小写上下文的上下文相关映射,如表 3-17 中所指定。

对于字符串X

  • R1 :将toUppercase(X)每个字符映射C到.XUppercase_Mapping(C)
  • R2 :将toLowercase(X)每个字符映射C到.XLowercase_Mapping(C)

这是来自 的示例SpecialCasing.txt,下面添加了我的注释:

00DF  ; 00DF   ; 0053 0073; 0053 0053;                      # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title>  ; <upper>  ; (<condition_list>;)? # <comment>

该行表示 U+00DF ( 'ß') 小写为 U+00DF ( ß),大写为 U+0053 U+0053 ( SS)。

这是来自 的示例UnicodeData.txt,下面添加了我的注释:

0041  ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061   ;
<code>; <name>                ; <ignore>       ; <lower>; <upper>

该行表示 U+0041 ( 'A') 小写为 U+0061 ( 'a')。它没有显式的大写映射,这意味着它本身是大写的。

这是另一个示例UnicodeData.txt

0061  ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041;        ; 0041
<code>; <name>              ; <ignore>            ; <lower>; <upper>

这一行表示 U+0061 ( 'a') 大写为 U+0041 ( 'A')。它没有明确的小写映射,这意味着它自己小写。

您可以编写一个脚本来解析这两个文件,读取这些示例之后的每一行,并构建小写/大写映射。然后,您可以将这些映射转换为提供符合规范toLowerCase/toUpperCase功能的小型 JavaScript 库。

这似乎需要做很多工作。根据 Firefox 中的旧行为以及究竟发生了什么变化(?),您可能会将工作限制在SpecialCasing.txt. (根据您提供的示例,我假设 Firefox 55 中仅更改了特殊外壳。)

// Instead of…
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const lowercased = normalized.toLowerCase();
  return lowercased;
}

// …one could do something like:
function lowerCaseSpecialCases(string) {
  // TODO: replace all SpecialCasing.txt characters with their lowercase
  // mapping.
  return string.replace(/TODO/g, fn);
}
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
  const lowercased = fixed.toLowerCase();
  return lowercased;
}

我编写了一个脚本来解析SpecialCasing.txt并生成一个 JS 库,该库实现了lowerCaseSpecialCases上面提到的功能(as toLower)以及toUpper. 它是:https ://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c根据您的具体用例,您可能根本不需要toUpper及其对应的正则表达式和映射。这是完整的生成库:

const reToLower = /[\u0130\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1FAF\u1FBC\u1FCC\u1FFC]/g;
const toLowerMap = new Map([
  ['\u0130', 'i\u0307'],
  ['\u1F88', '\u1F80'],
  ['\u1F89', '\u1F81'],
  ['\u1F8A', '\u1F82'],
  ['\u1F8B', '\u1F83'],
  ['\u1F8C', '\u1F84'],
  ['\u1F8D', '\u1F85'],
  ['\u1F8E', '\u1F86'],
  ['\u1F8F', '\u1F87'],
  ['\u1F98', '\u1F90'],
  ['\u1F99', '\u1F91'],
  ['\u1F9A', '\u1F92'],
  ['\u1F9B', '\u1F93'],
  ['\u1F9C', '\u1F94'],
  ['\u1F9D', '\u1F95'],
  ['\u1F9E', '\u1F96'],
  ['\u1F9F', '\u1F97'],
  ['\u1FA8', '\u1FA0'],
  ['\u1FA9', '\u1FA1'],
  ['\u1FAA', '\u1FA2'],
  ['\u1FAB', '\u1FA3'],
  ['\u1FAC', '\u1FA4'],
  ['\u1FAD', '\u1FA5'],
  ['\u1FAE', '\u1FA6'],
  ['\u1FAF', '\u1FA7'],
  ['\u1FBC', '\u1FB3'],
  ['\u1FCC', '\u1FC3'],
  ['\u1FFC', '\u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));

const reToUpper = /[\xDF\u0149\u01F0\u0390\u03B0\u0587\u1E96-\u1E9A\u1F50\u1F52\u1F54\u1F56\u1F80-\u1FAF\u1FB2-\u1FB4\u1FB6\u1FB7\u1FBC\u1FC2-\u1FC4\u1FC6\u1FC7\u1FCC\u1FD2\u1FD3\u1FD6\u1FD7\u1FE2-\u1FE4\u1FE6\u1FE7\u1FF2-\u1FF4\u1FF6\u1FF7\u1FFC\uFB00-\uFB06\uFB13-\uFB17]/g;
const toUpperMap = new Map([
  ['\xDF', 'SS'],
  ['\uFB00', 'FF'],
  ['\uFB01', 'FI'],
  ['\uFB02', 'FL'],
  ['\uFB03', 'FFI'],
  ['\uFB04', 'FFL'],
  ['\uFB05', 'ST'],
  ['\uFB06', 'ST'],
  ['\u0587', '\u0535\u0552'],
  ['\uFB13', '\u0544\u0546'],
  ['\uFB14', '\u0544\u0535'],
  ['\uFB15', '\u0544\u053B'],
  ['\uFB16', '\u054E\u0546'],
  ['\uFB17', '\u0544\u053D'],
  ['\u0149', '\u02BCN'],
  ['\u0390', '\u0399\u0308\u0301'],
  ['\u03B0', '\u03A5\u0308\u0301'],
  ['\u01F0', 'J\u030C'],
  ['\u1E96', 'H\u0331'],
  ['\u1E97', 'T\u0308'],
  ['\u1E98', 'W\u030A'],
  ['\u1E99', 'Y\u030A'],
  ['\u1E9A', 'A\u02BE'],
  ['\u1F50', '\u03A5\u0313'],
  ['\u1F52', '\u03A5\u0313\u0300'],
  ['\u1F54', '\u03A5\u0313\u0301'],
  ['\u1F56', '\u03A5\u0313\u0342'],
  ['\u1FB6', '\u0391\u0342'],
  ['\u1FC6', '\u0397\u0342'],
  ['\u1FD2', '\u0399\u0308\u0300'],
  ['\u1FD3', '\u0399\u0308\u0301'],
  ['\u1FD6', '\u0399\u0342'],
  ['\u1FD7', '\u0399\u0308\u0342'],
  ['\u1FE2', '\u03A5\u0308\u0300'],
  ['\u1FE3', '\u03A5\u0308\u0301'],
  ['\u1FE4', '\u03A1\u0313'],
  ['\u1FE6', '\u03A5\u0342'],
  ['\u1FE7', '\u03A5\u0308\u0342'],
  ['\u1FF6', '\u03A9\u0342'],
  ['\u1F80', '\u1F08\u0399'],
  ['\u1F81', '\u1F09\u0399'],
  ['\u1F82', '\u1F0A\u0399'],
  ['\u1F83', '\u1F0B\u0399'],
  ['\u1F84', '\u1F0C\u0399'],
  ['\u1F85', '\u1F0D\u0399'],
  ['\u1F86', '\u1F0E\u0399'],
  ['\u1F87', '\u1F0F\u0399'],
  ['\u1F88', '\u1F08\u0399'],
  ['\u1F89', '\u1F09\u0399'],
  ['\u1F8A', '\u1F0A\u0399'],
  ['\u1F8B', '\u1F0B\u0399'],
  ['\u1F8C', '\u1F0C\u0399'],
  ['\u1F8D', '\u1F0D\u0399'],
  ['\u1F8E', '\u1F0E\u0399'],
  ['\u1F8F', '\u1F0F\u0399'],
  ['\u1F90', '\u1F28\u0399'],
  ['\u1F91', '\u1F29\u0399'],
  ['\u1F92', '\u1F2A\u0399'],
  ['\u1F93', '\u1F2B\u0399'],
  ['\u1F94', '\u1F2C\u0399'],
  ['\u1F95', '\u1F2D\u0399'],
  ['\u1F96', '\u1F2E\u0399'],
  ['\u1F97', '\u1F2F\u0399'],
  ['\u1F98', '\u1F28\u0399'],
  ['\u1F99', '\u1F29\u0399'],
  ['\u1F9A', '\u1F2A\u0399'],
  ['\u1F9B', '\u1F2B\u0399'],
  ['\u1F9C', '\u1F2C\u0399'],
  ['\u1F9D', '\u1F2D\u0399'],
  ['\u1F9E', '\u1F2E\u0399'],
  ['\u1F9F', '\u1F2F\u0399'],
  ['\u1FA0', '\u1F68\u0399'],
  ['\u1FA1', '\u1F69\u0399'],
  ['\u1FA2', '\u1F6A\u0399'],
  ['\u1FA3', '\u1F6B\u0399'],
  ['\u1FA4', '\u1F6C\u0399'],
  ['\u1FA5', '\u1F6D\u0399'],
  ['\u1FA6', '\u1F6E\u0399'],
  ['\u1FA7', '\u1F6F\u0399'],
  ['\u1FA8', '\u1F68\u0399'],
  ['\u1FA9', '\u1F69\u0399'],
  ['\u1FAA', '\u1F6A\u0399'],
  ['\u1FAB', '\u1F6B\u0399'],
  ['\u1FAC', '\u1F6C\u0399'],
  ['\u1FAD', '\u1F6D\u0399'],
  ['\u1FAE', '\u1F6E\u0399'],
  ['\u1FAF', '\u1F6F\u0399'],
  ['\u1FB3', '\u0391\u0399'],
  ['\u1FBC', '\u0391\u0399'],
  ['\u1FC3', '\u0397\u0399'],
  ['\u1FCC', '\u0397\u0399'],
  ['\u1FF3', '\u03A9\u0399'],
  ['\u1FFC', '\u03A9\u0399'],
  ['\u1FB2', '\u1FBA\u0399'],
  ['\u1FB4', '\u0386\u0399'],
  ['\u1FC2', '\u1FCA\u0399'],
  ['\u1FC4', '\u0389\u0399'],
  ['\u1FF2', '\u1FFA\u0399'],
  ['\u1FF4', '\u038F\u0399'],
  ['\u1FB7', '\u0391\u0342\u0399'],
  ['\u1FC7', '\u0397\u0342\u0399'],
  ['\u1FF7', '\u03A9\u0342\u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));
于 2018-11-27T19:08:43.693 回答