c++ - 处理带有特殊字符的 Hunspell 建议

Question

我已使用 Visual Studio 2010 将Hunspell集成到 Windows 7 上的非托管 C++ 应用程序中。

我有适用于英语的拼写检查和建议，但现在我正试图让事情适用于西班牙语并遇到一些障碍。每当我收到有关西班牙语的建议时，带有重音字符的建议都无法正确翻译为std::wstring对象。

Hunspell->suggest以下是从该方法返回的建议示例：

Hunspell->suggest(...) 结果

这是我用来将其转换std::string为std::wstring

std::wstring StringToWString(const std::string& str)
{
    std::wstring convertedString;
    int requiredSize = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, 0, 0);
    if(requiredSize > 0)
    {
        std::vector<wchar_t> buffer(requiredSize);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, &buffer[0], requiredSize);
        convertedString.assign(buffer.begin(), buffer.end() - 1);
    }

    return convertedString;
}

在我运行完之后，我得到了这个，最后是时髦的角色。

转换为 wstring 后

谁能帮我弄清楚这里的转换会发生什么？我猜测它与从 hunspell 返回的负字符有关，但不知道如何将其转换为std::wstring转换代码的内容。

score 1 · Accepted Answer

It looks like the output of Hunspell is ASCII with code page 852. Use 852 instead of CP_UTF8 http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx

Or configure Hunspell to return UTF8.

score 1 · Accepted Answer

看起来 Hunspell 的输出是带有代码页28591（ISO 8859-1 Latin 1；西欧 (ISO)）的 ASCII，我通过查看 unix 命令行实用程序的 Hunspell 默认设置找到了它。

更改CP_UTF8为28591对我有用。

// Updated code page to 28591 from CP_UTF8
std::wstring StringToWString(const std::string& str)
{
    std::wstring convertedString;
    int requiredSize = MultiByteToWideChar(28591, 0, str.c_str(), -1, 0, 0);
    if(requiredSize > 0)
    {
        std::vector<wchar_t> buffer(requiredSize);
        MultiByteToWideChar(28591, 0, str.c_str(), -1, &buffer[0], requiredSize);
        convertedString.assign(buffer.begin(), buffer.end() - 1);
    }

    return convertedString;
}

这是来自 MSDN的代码页列表，它帮助我找到了正确的代码页整数。

c++ - 处理带有特殊字符的 Hunspell 建议

2 回答 2

Related

Reference