1

If I input a UTF-8 encoded file like,

example.html

<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>Текст на русском</title>

Where "Текст на русском" - Is text in russian

#include <string>
#include <ios>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <io.h>
#include <stdio.h>

using namespace std;
int main () 
{
int fl; unsigned int nbytes = 60000,bspr; char buf [60000];
errno_t err = _wsopen_s(&fl,L"c:\\example.html", _O_U8TEXT,_SH_DENYNO,_S_IREAD | _S_IWRITE ); // &fh,"c:\\example.html",_O_RDONLY, 
if ( err!=0 ) exit (1);
if ((bspr = _read(fl,buf,nbytes))<=0 )
{
    perror (" Error opening file ");
    exit (1);
}

}

I get buf[0]=60 '<', buf[1]=0, buf[2]=104 'h',buf[3]=0, and so on

until i reach russian letters, then i get totally improper symbols like 20 '' each followed by 4 '',

'char' - is the vstudio output of this character .. strangely same for 20 and 4.

So the question is - Is there any way I can get output buffer to a string till EOF, formatted properly , even if not using this operator ?

4

1 回答 1

0

看起来像是从 UTF-8 转换为 UTF-16 的_O_U8TEXT原因。_read您可能应该使用高级 Unicode 函数来阅读,例如getwc在 unicode 模式下打开流时。您可以使用_wfopen_swith L"rt, ccs=UTF-8",或者如果您需要共享支持,您可以使用现有_wsopen_s呼叫,然后使用_wfdopen.

于 2011-05-22T19:26:05.607 回答