77

我需要在客户端生成的文本数据中添加一个 UTF-8 字节顺序标记。我怎么做?

当然,使用new Blob(['\xEF\xBB\xBF' + content])yield '"my data"'

两者都'\uBBEF\x22BF'不起作用('\x22' == '"'成为 中的下一个字符content)。

是否可以将 JavaScript 中的 UTF-8 BOM 添加到生成的文本中?

是的,在这种情况下,我确实需要 UTF-8 BOM。

4

4 回答 4

161

附加\ufeff到字符串。请参阅http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

有关UTF-8 和 UTF-16以及 BOM的详细信息,请参阅@jeff-fischer@casey 之间的讨论。使上述工作真正起作用的是字符串始终用于表示 BOM,而不管使用的是 UTF-8 还是 UTF-16。\ufeff

有关详细说明,请参见Unicode 标准 5.0 第 2 章中的第 36 页。该页面的报价

表 2-4 中 UTF-8 的字节序条目被标记为 N/A,因为 UTF-8 代码单元的大小为 8 位,对于较大代码单元的字节序通常机器问题不适用。字节的序列化顺序不得偏离 UTF-8 编码形式定义的顺序。UTF-8 既不需要也不建议使用 BOM,但在 UTF-8 数据从使用 BOM 的其他编码形式转换或 BOM 用作 UTF-8 签名的情况下可能会遇到。

于 2013-07-26T10:51:20.003 回答
35

我有同样的问题,这是我想出的解决方案:

var blob = new Blob([
                    new Uint8Array([0xEF, 0xBB, 0xBF]), // UTF-8 BOM
                    "Text",
                    ... // Remaining data
                    ],
                    { type: "text/plain;charset=utf-8" });

使用Uint8Array阻止浏览器将这些字节转换为字符串(在 Chrome 和 Firefox 上测试)。

您应该替换text/plain为所需的 MIME 类型。

于 2016-12-28T13:25:54.730 回答
21

I'm editing my original answer. The above answer really demands elaboration as this is a convoluted solution by Node.js.

The short answer is, yes, this code works.

The long answer is, no, FEFF is not the byte order mark for utf-8. Apparently node took some sort of shortcut for writing encodings within files. FEFF is the UTF16 Little Endian encoding as can be seen within the Byte Order Mark wikipedia article and can also be viewed within a binary text editor after having written the file. I've verified this is the case.

http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

Apparently, Node.JS uses the \ufeff to signify any number of encoding. It takes the \ufeff marker and converts it into the correct byte order mark based on the 3rd options parameter of writeFile. The 3rd parameter you pass in the encoding string. Node.JS takes this encoding string and converts the \ufeff fixed byte encoding into any one of the actual encoding's byte order marks.

UTF-8 Example:

fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf8' }, function(err) {
   /* The actual byte order mark written to the file is EF BB BF */
}

UTF-16 Little Endian Example:

fs.writeFile(someFilename, '\ufeff' + html, { encoding: 'utf16le' }, function(err) {
   /* The actual byte order mark written to the file is FF FE */
}

So, as you can see the \ufeff is simply a marker stating any number of resulting encodings. The actual encoding that makes it into the file is directly dependent the encoding option specified. The marker used within the string is really irrelevant to what gets written to the file.

I suspect that the reasoning behind this is because they chose not to write byte order marks and the 3 byte mark for UTF-8 isn't easily encoded into the javascript string to be written to disk. So, they used the UTF16LE BOM as a placeholder mark within the string which gets substituted at write-time.

于 2015-01-16T00:46:05.043 回答
11

这是我的解决方案:

var blob = new Blob(["\uFEFF"+csv], {
type: 'text/csv; charset=utf-18'
});
于 2018-11-20T21:04:48.600 回答