c++ - 加载和保存带有波兰语字符的 HTML 文件

Question

我需要加载一个 HTML 模板文件（使用std::ifstream），添加一些内容，然后将其保存为一个完整的网页。如果不是波兰语字符，那将很简单-我尝试了char/ wchar_t，Unicode/Multi-Byte字符集，iso-8859-2/ utf-8，ANSI/的所有组合，utf-8但它们都不适合我（总是有一些显示不正确的字符（或者其中一些根本没有显示））。

我可以在这里粘贴很多代码和文件，但我不确定这是否会有所帮助。但也许你可以告诉我：模板文件应该有什么格式/编码，我应该在其中为网页声明什么编码，以及我应该如何加载和保存该文件以获得正确的结果？

（如果我的问题不够具体，或者您确实需要代码/文件示例，请告诉我。）

编辑：我已经尝试了评论中建议的库：

std::string fix_utf8_string(std::string const & str)
{
    std::string temp;
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
    return str;
}

称呼：

fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");

抛出：utf8::not_enough_room- 我做错了什么？

score 0 · Accepted Answer

不确定这是否是（完美的）方法，但以下解决方案对我有用！

我将我的 HTML 模板文件保存为 ANSI（或者至少 Notepad++ 是这样说的）并更改了每个写入文件流操作：

file << std::string("some text with polish chars: ąśżźćńłóę");

至：

file << ToUtf8("some text with polish chars: ąśżźćńłóę");

在哪里：

std::string ToUtf8(std::string ansiText)
{
    int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0);
    wchar_t * wideText = new wchar_t[ansiRequiredSize + 1];
    wideText[ansiRequiredSize] = NULL;
    MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize);
    int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL);
    char utf8Text[1024];
    utf8Text[utf8RequiredSize] = NULL;
    WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL);
    delete [] wideText;
    return utf8Text;
}

基本思想是使用MultiByteToWideChar()和WideCharToMultiByte()函数将字符串从 ANSI（多字节）转换为宽字符，然后从宽字符转换为 utf-8（更多信息请参见：http: //www.chilkatsoft.com/p/p_348.asp） . 最好的部分是 - 我不必更改任何其他内容（即std::ofstream使用std::wofstream或使用任何 3rd 方库或更改我实际使用文件流的方式（而不是将字符串转换为必要的 utf-8））！

可能也适用于其他语言，尽管我没有测试过。

c++ - 加载和保存带有波兰语字符的 HTML 文件

1 回答 1

Related

Reference