c++ - 未能找到存在于 std::wstring 中的 wchar_t

Question

当我遇到一个奇怪的行为时，我正在玩std::wstringand 。std::wfstream即，似乎std::basic_string<wchar_t>::find无法找到某些字符。考虑以下代码：

int main()
{
    std::wifstream input("input.txt");
    std::wofstream output("output.txt");

    if(!(input && output)){
        std::cerr << "file(s) not opened";
        return -1;
    }

    std::wstring buf;
    std::getline(input, buf);

    output << buf;

    std::cout << buf.find(L'ć');
}

在这里，我只是读取文件的第一行input并将其写入output文件。程序运行前，第一个文件的内容aąbcćd为空，输出文件为空。执行代码后，输入文件成功复制到输出文件中。

令我惊讶的是，我试图在中找到ć一封信buf并遇到了提到的奇怪行为。程序执行后，我确认输出文件中包含的正是aąbcćd，其中显然包含了提到的字符ć。

然而，这条线的std::cout << buf.find(L'ć')表现并不像预期的那样。4考虑到的内存布局，我没想到会得到的输出std::wstring，但我也绝对没想到会得到std::string::npos。值得一提的是，用这种方法查找常规 ASCII 字符是成功的。

综上所述，上述代码正确地将输入文件的第一行复制到输出文件，但它未能在字符串中找到一个字符（返回 npos），该字符负责保存要复制的数据。为什么呢？是什么导致find这里失败？

注意：这两个文件在 Windows 上都是 UTF-8 编码的。

score 1 · Accepted Answer

不幸wchar_t的是，它不是 UTF-8，它的 UTF-16（在 Windows 上）并且当您读取 UTF-8 文件时不会发生神奇的转换。如果您调试程序，您会在buf变量中看到损坏的字符。

您要么需要将字符串读取为std::string然后从 UTF-8 转换为，whar_t要么在 UTF-8 中工作，然后将文字字符串从UTF-8 字符whcar_t转换std::string为 UTF-8 字符。

如果您使用的是最近的编译器，则可以使用以下内容创建 UTF-8 字符串文字：

u8"ć"

以下应该有效：

int main()
{
    std::ifstream input("input.txt");
    std::ofstream output("output.txt");

    if(!(input && output)){
        std::cerr << "file(s) not opened";
        return -1;
    }

    std::string buf;
    std::getline(input, buf);

    output << buf;

    std::cout << buf.find(u8"ć");
}

c++ - 未能找到存在于 std::wstring 中的 wchar_t

1 回答 1

Related

Reference