1

我最近发现了<codecvt>标头,所以我想在 UTF-8 和 UTF-16 之间进行转换。

我使用来自 C++11的codecvt_utf8_utf16方面。wstring_convert我遇到的问题是,当我尝试将 UTF-16 字符串转换为 UTF-8,然后再次转换为 UTF-16 时,字节序会发生变化。

对于此代码:

#include <codecvt>  
#include <string>  
#include <locale>  
#include <iostream>  

using namespace std;  

int main(int argc, char const *argv[])
{
  wstring_convert<codecvt_utf8_utf16<char16_t>, char16_t>
                                                convert;

  u16string utf16 = u"\ub098\ub294\ud0dc\uc624";

  cout << hex << "UTF-16\n\n";
  for (char16_t c : utf16)
    cout << "[" << c << "] ";

  string utf8 = convert.to_bytes(utf16);

  cout << "\n\nUTF-16 to UTF-8\n\n";
  for (unsigned char c : utf8)
    cout << "[" << int(c) << "] ";
  cout << "\n\nConverting back to UTF-16\n\n";

  utf16 = convert.from_bytes(utf8);

  for (char16_t c : utf16)
    cout << "[" << c << "] ";
  cout << endl;
}

我得到这个输出:

UTF-16

[b098] [b294] [d0dc] [c624]

UTF-16 到 UTF-8

[eb] [82] [98] [eb] [8a] [94] [ed] [83] [9c] [ec] [98] [a4]

转换回 UTF-16

[98b0] [94b2] [dcd0] [24c6]

wstring_convert当我更改to的第三个模板参数时std::little_endian,字节被反转。

我错过了什么 ?

4

1 回答 1

1

确实是个bug,https: //gcc.gnu.org/bugzilla/show_bug.cgi?id= 66855 5.3会修复

于 2015-07-13T20:49:00.857 回答