我最近发现了<codecvt>
标头,所以我想在 UTF-8 和 UTF-16 之间进行转换。
我使用来自 C++11的codecvt_utf8_utf16
方面。wstring_convert
我遇到的问题是,当我尝试将 UTF-16 字符串转换为 UTF-8,然后再次转换为 UTF-16 时,字节序会发生变化。
对于此代码:
#include <codecvt>
#include <string>
#include <locale>
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
wstring_convert<codecvt_utf8_utf16<char16_t>, char16_t>
convert;
u16string utf16 = u"\ub098\ub294\ud0dc\uc624";
cout << hex << "UTF-16\n\n";
for (char16_t c : utf16)
cout << "[" << c << "] ";
string utf8 = convert.to_bytes(utf16);
cout << "\n\nUTF-16 to UTF-8\n\n";
for (unsigned char c : utf8)
cout << "[" << int(c) << "] ";
cout << "\n\nConverting back to UTF-16\n\n";
utf16 = convert.from_bytes(utf8);
for (char16_t c : utf16)
cout << "[" << c << "] ";
cout << endl;
}
我得到这个输出:
UTF-16
[b098] [b294] [d0dc] [c624]
UTF-16 到 UTF-8
[eb] [82] [98] [eb] [8a] [94] [ed] [83] [9c] [ec] [98] [a4]
转换回 UTF-16
[98b0] [94b2] [dcd0] [24c6]
wstring_convert
当我更改to的第三个模板参数时std::little_endian
,字节被反转。
我错过了什么 ?