c++ - MultiByteToWideChar 用垃圾终止输出缓冲区，但没有报告错误。为什么？

Question

前几天在开发程序时，我不得不将 ASCII 字符串转换为 Unicode 字符串。顺便说一句，我正在使用 Visual Studio 2012 在 Windows 上工作。我注意到 Win32 函数有一些我无法解决的奇怪行为MultiByteToWideChar。我写了一些测试代码，如下：

int main()
{
    /* Create const test string */
    char str[] = "test string";

    /* Create empty wchar_t buffer to hold Unicode form of above string, and initialize (zero) it */
    wchar_t *buffer = (wchar_t*) LocalAlloc(LMEM_ZEROINIT, sizeof(wchar_t) * strlen(str));

    /* Convert str to Unicode and store in buffer */
    int result = MultiByteToWideChar(CP_UTF8, NULL, str, strlen(str), buffer, strlen(str));
    if (result == 0)
        printf("GetLastError result: %d\n", GetLastError());

    /* Print MultiByteToWideChar result, str's length, and buffer's length */
    printf_s(
        "MultiByteToWideChar result: %d\n"
        "'str' length: %d\n"
        "'buffer' length: %d\n",
        result, strlen(str), wcslen(buffer));

    /* Create a message box to display the Unicode string */
    MessageBoxW(NULL, buffer, L"'buffer' contents", MB_OK);

    /* Also write buffer to file, raw */
    FILE *stream = NULL;
    fopen_s(&stream, "c:\\test.dat", "wb");
    fwrite(buffer, sizeof(wchar_t), wcslen(buffer), stream);
    fclose(stream);

    return 0;
}

正如你所看到的，它只需要一个普通的字符串，创建一个缓冲区来存储 Unicode 字符串，将转换后的 Unicode 字符串放入缓冲区，并显示一些结果，还将缓冲区写入文件。

输出：

MultiByteToWideChar result: 11
'str' length: 11
'buffer' length: 16

已经很奇怪了。该函数正在处理 C 字符串中正确数量的字符，但wcslen报告输出缓冲区比 C 字符串长！我很确定我也正确分配了缓冲区。

我尝试过使用不同大小的字符串长度，但最后总是有垃圾，并且wcslen总是报告缓冲区的长度是 4 的倍数。

最后，对于这个特定的字符串 ( "test string")，这是打印到文件的原始缓冲区：

74 00 65 00 73 00 74 00 20 00 73 00 74 00 72 00   t.e.s.t. .s.t.r.
69 00 6E 00 67 00 AB AB AB AB AB AB AB AB EE FE   i.n.g...........

（即 32 个字节或 16 个 Unicode 字符。）

末尾的 10 个字节为 5 个字符；四个U+ABAB和一个U+FEEE，这对我来说毫无意义。

每次我尝试转换字符串时，它们都会以不同的数量出现。

我有点没主意了。任何人？

提前致谢！

score 5 · Accepted Answer

/* Create empty wchar_t buffer to hold Unicode form of above string, and initialize (zero) it */
wchar_t *buffer = (wchar_t*) LocalAlloc(LMEM_ZEROINIT, sizeof(wchar_t) * strlen(str));

这确实是问题开始的地方。strlen(str) 的值没有意义，尤其是当输入字符串以 utf-8 编码时。你往往会意外地逃脱它，因为它通常会创建一个太长的缓冲区，而不是计算一个错误的错误。

但是您也可以通过正确的方式轻松避免该错误。您必须调用该函数两次。第一次，为最后一个参数 (cchWideChar) 传递 0。该函数返回所需的缓冲区大小（字符，而不是字节）。现在足以分配缓冲区并在您第二次调用该函数时传递正确的值。

score 4 · Accepted Answer

（将评论转换为答案）

您需要在长度中包含尾随空字符（通过strlen(str) + 1而不是strlen(str)）。另外，你buffer的元素太短了——它还需要为结尾的空字符留出空间。

score 4 · Accepted Answer

正如其他人所评论的那样，您基本上是在滥用MultiByteToWideChar()并且wcslen()没有正确处理空终止符。如果调用时不包含空终止符MultiByteToWideChar()，则不会输出空终止符。

试试这个：

int main() 
{ 
    /* Create const test string */ 
    char str[] = "test string"; 
    int strLen = strlen(str);

    WCHAR *buffer = NULL;
    int bufLen = 0;

    /* Calculate buffer size */ 
    int result = MultiByteToWideChar(CP_UTF8, NULL, str, strLen, NULL, 0); 
    if (result > 0)
    {
        /* Create buffer to hold Unicode form of above string */ 
        buffer = (WCHAR*) LocalAlloc(LPTR, sizeof(WCHAR) * (result+1)); 
        if (buffer != NULL)
        { 
            /* Convert str to Unicode and store in buffer */ 
            bufLen = result; 
            result = MultiByteToWideChar(CP_UTF8, NULL, str, strLen+1, buffer, bufLen); 
        }
    }

    if ((!buffer) || (result == 0))
        printf("GetLastError result: %d\n", GetLastError());          

    /* Print MultiByteToWideChar result, str's length, and buffer's length */ 
    printf_s( 
        "MultiByteToWideChar result: %d\n" 
        "'str' length: %d\n" 
        "'buffer' length: %d\n", 
        result, strLen, bufLen); 

    /* Create a message box to display the Unicode string */ 
    MessageBoxW(NULL, buffer, L"'buffer' contents", MB_OK); 

    /* Also write buffer to file, raw */ 
    FILE *stream = NULL; 
    errno_t err = fopen_s(&stream, "c:\\test.dat", "wb");
    if (err == 0)
    { 
        fwrite(buffer, sizeof(WCHAR), bufLen, stream); 
        fclose(stream); 
    }
    else
        printf("Errno result: %d\n", err);

    if (buffer)
        LocalFree(buffer);

    return 0; 
}

由于您使用的是 C++，因此您可以使用std::stringandstd:wstring来简化内存管理

int main() 
{ 
    /* Create const test string */ 
    std::string str = "test string"; 
    std::wstring buffer;

    /* Calculate buffer size */ 
    int result = MultiByteToWideChar(CP_UTF8, NULL, str.c_str(), str.length(), NULL, 0); 
    if (result > 0)
    {
        /* Allocate buffer to hold Unicode form of above string */ 
        buffer.resize(result); 

        /* Convert str to Unicode and store in buffer */ 
        result = MultiByteToWideChar(CP_UTF8, NULL, str.c_str(), str.length(), &buffer[0], result); 
    }

    if (result == 0)
        printf("GetLastError result: %d\n", GetLastError());          

    /* Print MultiByteToWideChar result, str's length, and buffer's length */ 
    printf_s( 
        "MultiByteToWideChar result: %d\n" 
        "'str' length: %d\n" 
        "'buffer' length: %d\n", 
        result, str.length(), buffer.length()); 

    /* Create a message box to display the Unicode string */ 
    MessageBoxW(NULL, buffer.c_str(), L"'buffer' contents", MB_OK); 

    /* Also write buffer to file, raw */ 
    FILE *stream = NULL; 
    errno_t err = fopen_s(&stream, "c:\\test.dat", "wb");
    if (err == 0)
    { 
        fwrite(buffer.data(), sizeof(std::wstring::value_type), buffer.length(), stream); 
        fclose(stream); 
    }
    else
        printf("Errno result: %d\n", err);

    return 0; 
}

c++ - MultiByteToWideChar 用垃圾终止输出缓冲区，但没有报告错误。为什么？

3 回答 3

Related

Reference