javascript - 如何在nodejs中将UTF16文件转换为UTF8文件

Question

我有一个以 UTF16 编码的 xml 文件，我想将其转换为 UTF8 以便处理它。如果我使用这个命令：

iconv -f UTF-16 -t UTF-8 file.xml > converted_file.xml

该文件已正确转换，我能够处理它。我想在 nodejs 中做同样的事情。

目前我有一个文件缓冲区，我已经尝试了所有我能想到的以及我在互联网上可以找到的所有内容，但没有成功。

以下是我迄今为止尝试过的一些示例：

content = new Buffer((new Buffer(content, 'ucs2')).toString('utf8'));

我也尝试过使用这些功能：

http://jonisalonen.com/2012/from-utf-16-to-utf-8-in-javascript/ https://stackoverflow.com/a/14601808/1405208

第一个没有任何改变，链接只给我汉字。

score 5 · Accepted Answer

var content = fs.readFileSync('myfile.xml', {encoding:'ucs2'});
fs.writeFileSync('myfile.xml', content, {encoding:'utf8'});

score 2 · Accepted Answer

虽然我上面的答案是所问问题的最佳答案。我希望这个答案能帮助一些需要将文件作为二进制字符串读取的人：

const reader = new FileReader();
reader.readAsBinaryString(this.fileToImport);

就我而言，该文件位于 utf-16 中，我尝试将其读入 XLSX：

const wb = XLSX.read(bstr, { type: "binary" });

结合上面的两个链接，我首先删除了表明它是 UTF-16 (0xFFFE) 的前两个字符，然后使用此链接创建正确的数字（但我认为它实际上提供了 UTF-7 编码） https://stackoverflow .com/a/14601808/1405208

最后，我应用了第二个链接来获得正确的 UTF-8 数字集：https ://stackoverflow.com/a/14601808/1405208

我最终得到的代码：

decodeUTF16LE(binaryStr) {
      if (binaryStr.charCodeAt(0) != 255 || binaryStr.charCodeAt(1) != 254) {
        return binaryStr;
      }
      const utf8 = [];
      for (var i = 2; i < binaryStr.length; i += 2) {
        let charcode = binaryStr.charCodeAt(i) | (binaryStr.charCodeAt(i + 1) << 8);
        if (charcode < 0x80) utf8.push(charcode);
        else if (charcode < 0x800) {
          utf8.push(0xc0 | (charcode >> 6), 0x80 | (charcode & 0x3f));
        } else if (charcode < 0xd800 || charcode >= 0xe000) {
          utf8.push(0xe0 | (charcode >> 12), 0x80 | ((charcode >> 6) & 0x3f), 0x80 | (charcode & 0x3f));
        }
        // surrogate pair
        else {
          i++;
          // UTF-16 encodes 0x10000-0x10FFFF by
          // subtracting 0x10000 and splitting the
          // 20 bits of 0x0-0xFFFFF into two halves
          charcode = 0x10000 + (((charcode & 0x3ff) << 10) | (charcode & 0x3ff));
          utf8.push(
            0xf0 | (charcode >> 18),
            0x80 | ((charcode >> 12) & 0x3f),
            0x80 | ((charcode >> 6) & 0x3f),
            0x80 | (charcode & 0x3f)
          );
        }
      }
      return String.fromCharCode.apply(String, utf8);
},

javascript - 如何在nodejs中将UTF16文件转换为UTF8文件

2 回答 2

Related

Reference