c - Unix：为什么在 ASCII 之后停止读取 C 中的宽字符？

Question

characters.txt 的内容（从 od -c 输出）：

0000000   %   (   )   *   +   ,   -   .   /   0   1   2   3   4   5   6
0000020   7   8   9   <   =   >   ?   [   ]  \n   A   B   C   D   E   F
0000040   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V
0000060   W   X   Y   Z  \n   a   b   c   d   e   f   g   h   i   j   k
0000100   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z  \n
0000120 316 223 316 224 316 230 316 233 316 236 316 243 316 246 316 250
0000140 316 251 316 261 316 262 316 263 316 264 316 265 316 266 316 267
0000160 316 270 316 271 316 272 316 273 316 274 316 275 316 276 316 277
0000200 317 200 317 201 317 202 317 203 317 204 317 205 317 206 317 207
0000220 317 210 317 211  \n

也就是说，一些 ASCII 后跟一些 UTF-8 中的希腊语。我想阅读这些字符（以下是在 glibc 信息页面中给出的示例之后编写的）

wint_t* read_characters() {
    char *filename = "characters.txt";
    FILE *infile;
    infile = fopen (filename, "rb");
    printf ("File orientation: %d\n", fwide (infile,0));
    static wint_t b[16384], c, *p;
    p = b;
    while ((p-b)<sizeof(b)-4 && (c = fgetwc (infile)) != WEOF)
        *p++ = c;
    *p++ = WEOF;
    printf("\nRead %ld wint_t chars from characters.txt\n", p-b);
    return b;
}

输出是：

文件方向：0 从 characters.txt 中读取 81 个 wint_t 字符

这意味着阅读以第一个希腊字符停止。为什么？我没有使用可以伪造 WEOF 的带符号变量。谁能帮忙？

score 1 · Accepted Answer

解决方案（由 nm 提示）是包含此调用

setlocale(LC_ALL, "en_US.UTF-8");

即使全局设置了 LC_ALL，这也是必要的，因为 C 程序总是以“C”语言环境开始。如果你想使用其他东西，你总是必须设置它。

c - Unix：为什么在 ASCII 之后停止读取 C 中的宽字符？

1 回答 1

Related

Reference