c++ - 将具有二进制字节序列的 std::string 转换为具有当前语言环境字符集的 std::wstring

Question

目前我两次读取同一个文件，因为我需要两种不同的表示形式：（a）未经任何转换的原始字节序列，（b）将字节转换为当前执行字符集的文本表示形式。基本上，代码如下所示：

using namespace std;
const char* fileName = "test.txt";

// Part 1: Read the file unmodified byte-per-byte
string binContent;
ifstream file1( fileName, ifstream::binary );
while( true ) {
    char c;
    file1.get( c );
    if( !file1.good() ) break;
    binContent.push_back( c );
}

// Part 2: Read the file and convert the character code according to the
// current locale from external character set to execution character set
wstring textContent;
wifstream file2( fileName );
wifstream.imbue( locale("") );
while( true ) {
    wchar_t c;
    file2 >> c;
    if( !file2.good() ) break;
    textContent.push_back( c );
}

显然，代码两次读取同一个文件。我想避免这种情况并直接转换binaryContent为textContent内存。

请注意，这不仅仅是简单char的wchar_t转换，因为如果当前语言环境的字符编码locale("")与执行字符编码不同，它也可能涉及真正的字符转换。这样的转换可能虽然是必要的，即使textContent是窄字符串也是如此。

在上面的示例中，第 2 部分中字符转换的魔力发生在template<typename _CharT, typename _Traits> bool basic_filebuf<_CharT, _Traits >::_M_convert_to_external( _CharT* __ibuf, streamsize __ilen )并fstream.tcc涉及使用codecvt语言环境的方面。

我希望有一种方法可以wistringstream从binContent对象而不是 a构造一个对象wifsteam，然后wistringstream用适当的语言环境填充。但这似乎不起作用，因为wistringstream已经期望宽字符的所有构造函数wistringstream似乎也没有实现wifstream.

codecvt有没有比手动使用更好的方法（即更简洁、更不容易出错）的方法？

c++ - 将具有二进制字节序列的 std::string 转换为具有当前语言环境字符集的 std::wstring

0 回答 0

Related

Reference