2

I'm trying to write strings with non-ASCII characters in it to a file, such as "maçã", "pé", and so on.

I'm currently doing something like this:

_setmode(_fileno(stdout), _O_U16TEXT);

//I added the line above recently to the question,
//but it was in the code before, I forgot to write it
//I also included some header files, to be able to do that
//can't really remember which, if necessary I'll look it up.


wstring word=L"";
wstring file = L"example_file.txt"
vector<wstring> my_vector;

wofstream my_output(file);

while(word != L".")
{
 getline(wcin, word);
 if(word!= L".")
   my_vector.pushback(word);
}

for(std::vector<wstring>::iterator j=my_vector.begin(); j!=my_vector.end(); j++)
    {
        my_output << *j << endl;
//element pointed by iterator going through the whole vector

        my_output << L("maçã pé") << endl;
    }
my_output.close();

Now, if I enter "maçã", "pé" and "." as words (only the 1st two are stored in the vector), the output to the file is rather strange:

  • the words I entered (stored in variables) appear strange: "ma‡Æ" and "p,";
  • the words stored directly in the code appear perfectly normal "maçã pé";

I have tried using wcin >> word instead of getline(wcin, word) and writing to the console instead of a file, the results are the same: writes variable strings wrong, writes strings directly in code perfectly.

I cannot find a reason for this to happen, so any help will be greatly appreciated.

Edit: I am working in Windows 7, using Visual C++ 2010

Edit 2: added one more line of code, that I had missed. (right in the beginning)

EDIT 3: following SigTerm's suggestion, I realised the problem is with the input: neither wcin nor getline are getting the strings with right formatting to variable wstring word. So, the question is, do you know what is causing this or how to fix it?

4

3 回答 3

3

尝试包括

#include <locale>

在 main 的开头,写

std::locale::global(std::locale(""));
于 2013-09-28T18:00:46.067 回答
1

Windows 使编码混乱,因为控制台通常使用“OEM”代码页,而 GUI 应用程序使用“ANSI”代码页。每个都因使用的本地化 Windows 版本而异。在美国 Windows 上,OEM 代码页为 437,ANSI 代码页为 1252。

牢记上述内容,将流设置为正在使用的语言环境可以解决问题。如果在控制台中工作,请使用控制台的代码页:

wcin.imbue(std::locale("English_United States.437"));
wcout.imbue(std::locale("English_United States.437"));

但请记住,大多数代码页都是单字节编码,因此只能理解 256 个可能的 Unicode 字符:

wstring word;
wcin.imbue(std::locale("English_United States.437"));
wcout.imbue(std::locale("English_United States.437"));
getline(wcin, word);
wcout << word << endl;
wcout << L"maçã pé" << endl;

这将在控制台上返回:

maça pé
maça pé

代码页 437不包含ã.

如果您满足以下条件,则可以从控制台使用代码页 1252:

  • 问题chcp 1252
  • 使用 TrueType 控制台字体,例如 Consolas 或 Lucida Console。
  • 相反,灌输流English_United States.1252

写入文件也有类似的问题。如果您在记事本中查看该文件,它会使用 ANSI 代码页来解释文件中的字节。因此,即使控制台应用程序使用代码页 437,如果使用 437 代码页编写,记事本也会错误地显示文件。在代码页 1252 中写入文件也无济于事,因为这两个代码页不会解释同一组 Unicode 代码点。这个问题的一些答案是获得不同的文件查看器,例如 Notepad++ 或以支持所有 Unicode 字符的 UTF-8 编写文件。

于 2013-09-29T17:44:38.703 回答
0
于 2013-09-29T04:43:57.093 回答