c++ - 从 ToUnicodeEx() 转换为 UTF-8

Question

我使用 GetAsyncKeyState() 获取输入，然后使用 ToUnicodeEx() 将其转换为 unicode：

wchar_t character[1];
ToUnicodeEx(i, scanCode, keyboardState, character, 1, 0, layout);

我可以使用 wfstream 将其写入文件，如下所示：

wchar_t buffer[128]; // Will not print unicode without these 2 lines
file.rdbuf()->pubsetbuf(buffer, 128);
file.put(0xFEFF); // BOM needed since it's encoded using UCS-2 LE
file << character[0];

当我在 Notepad++ 中打开此文件时，它位于 UCS-2 LE 中，而我希望它采用 UTF-8 格式。我相信 ToUnicodeEx() 以 UCS-2 LE 格式返回它，它也只适用于宽字符。有没有办法通过首先转换为 UTF-8 来使用 fstream 或 wfstream 来做到这一点？谢谢！

score 2 · Accepted Answer

您可能想要使用WideCharToMultiByte函数。

例如：

wchar_t buffer[LEN]; // input buffer
char output_buffer[OUT_LEN]; // output buffer where the utf-8 string will be written
int num = WideCharToMultiByte(
    CP_UTF8,
    0,
    buffer,
    number_of_characters_in_buffer, // or -1 if buffer is null-terminated
    output_buffer,
    size_in_bytes_of_output_buffer,
    NULL,
    NULL);

score 2 · Accepted Answer

Windows API 通常将 UTF-16 称为unicode，这有点令人困惑。这意味着大多数 unicode Win32 函数调用操作或提供 utf-16 字符串。

所以ToUnicodeEx返回一个 utf-16 字符串。

如果你需要它作为 utf 8，你需要使用WideCharToMultiByte转换它

score 0 · Accepted Answer

感谢您提供的所有帮助，我已经通过有关 WideCharToMultiByte() 和 UTF-8 的博客文章的额外帮助解决了我的问题。

此函数将宽字符数组转换为 UTF-8 字符串：

// Takes in pointer to wide char array and length of the array
std::string ConvertCharacters(const wchar_t* buffer, int len)
{
    int nChars = WideCharToMultiByte(CP_UTF8, 0, buffer, len, NULL, 0, NULL, NULL);

    if (nChars == 0)
    {
        return u8"";
    }

    std::string newBuffer;
    newBuffer.resize(nChars);
    WideCharToMultiByte(CP_UTF8, 0, buffer, len, const_cast<char*>(newBuffer.c_str()), nChars, NULL, NULL);
    return newBuffer;
}

c++ - 从 ToUnicodeEx() 转换为 UTF-8

3 回答 3

Related

Reference