c++ - 如何在 Windows 上正确地将拉丁字符打印到 C++ 控制台？

Question

我在用 C++ 将法语字符写入控制台时遇到问题。该字符串是使用从文件中加载的std::ifstream，std::getline然后使用std::cout. 这是文件中的字符串：

La chaîne qui 对应 au 代码 "TEST_CODE" n'a pas été trouvée à l'aide locale "fr"。

这是字符串的打印方式：

La cha¯ne qui 对应 au 代码 "TEST_CODE" n'a pas ÚtÚ trouvÚe Ó l'aide locale "fr"。

我该如何解决这个问题？

score 5 · Accepted Answer

问题是控制台使用的代码页与系统的其余部分不同。例如，通常为美洲和西欧设置的 Windows 系统使用 CP1252，但这些地区的控制台使用 CP437 或 CP850。

您可以设置控制台输出代码页以匹配您正在使用的编码，也可以转换字符串以匹配控制台的输出代码页。

设置控制台输出代码页：

SetConsoleOutputCP(GetACP()); // GetACP() returns the system codepage.
std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".";

或在编码之间转换的多种方法之一（此方法需要 VS2010 或更高版本）：

#include <codecvt> // for wstring_convert
#include <locale>  // for codecvt_byname
#include <iostream>

int main() {
    typedef std::codecvt_byname<wchar_t,char,std::mbstate_t> codecvt;

    // the following relies on non-standard behavior, codecvt destructors are supposed to be protected and unusable here, but VC++ doesn't complain.
    std::wstring_convert<codecvt> cp1252(new codecvt(".1252"));
    std::wstring_convert<codecvt> cp850(new codecvt(".850"));

    std::cout << cp850.to_bytes(cp1252.from_bytes("...été trouvée à...\n")).c_str();
}

后一个示例假设您确实需要在 1252 和 850 之间进行转换。您可能应该使用函数 GetOEMCP() 来找出实际的目标代码页，而源代码页实际上取决于您用于源代码的内容，而不是在运行程序的机器上 GetACP() 的结果。

另请注意，该程序依赖于标准不保证的东西：wchar_t 编码在语言环境之间共享。在大多数平台上都是如此——通常在所有语言环境中某些 Unicode 编码用于 wchar_t——但不是全部。

理想情况下，您可以在任何地方使用 UTF-8，并且以下内容可以正常工作，就像现在在其他平台上一样：

#include <iostream>

int main() {
    std::cout << "La chaîne qui correspond au code \"TEST_CODE\" n'a pas été trouvée à l'aide locale \"fr\".\n";
}

不幸的是，如果不放弃 UTF-16 作为 wchar_t 编码并采用 4 字节 wchar_t，或者违反标准的要求并破坏符合标准的程序，Windows 就无法以这种方式支持 UTF-8。

score 3 · Accepted Answer

如果要在控制台中写入 Unicode 字符，则必须进行一些初始化：

_setmode(_fileno(stdout), _O_U16TEXT);

然后您的法语字符会正确显示（我已经使用Consolas控制台字体对其进行了测试）：

#include <fcntl.h>
#include <io.h>

#include <iostream>
#include <ostream>
#include <string>

using namespace std;

int main() 
{
    // Prepare console output in Unicode
    _setmode(_fileno(stdout), _O_U16TEXT);


    //
    // Build Unicode UTF-16 string with French characters
    //

    // 0x00EE - LATIN SMALL LETTER I WITH CIRCUMFLEX
    // 0x00E9 - LATIN SMALL LETTER E WITH ACUTE
    // 0x00E0 - LATIN SMALL LETTER A WITH GRAVE

    wstring str(L"La cha");
    str += L'\x00EE';
    str += L"ne qui correspond au code \"TEST_CODE\" ";
    str += L"n'a pas ";
    str += L'\x00E9';
    str += L't';
    str += L'\x00E9';
    str += L" trouv";
    str += L'\x00E9';
    str += L"e ";
    str += L'\x00E0';
    str += L" l'aide locale \"fr\".";


    // Print the string to the console
    wcout << str << endl;  
}

考虑阅读 Michael Kaplan 的以下博客文章：

此外，如果您正在从文件中读取一些文本，您必须知道使用哪种编码：UTF-8？UTF-16LE？UTF-16BE？一些特定的代码页？然后，您可以从特定编码转换为 Unicode UTF-16 并在 Windows 应用程序中使用 UTF-16。要从某些代码页（或从 UTF-8）转换为 UTF-16，您可以使用MultiByteToWideChar()API或ATL 转换助手类CA2W。

c++ - 如何在 Windows 上正确地将拉丁字符打印到 C++ 控制台？

2 回答 2

Related

Reference