1

请看我写的这个片段,它应该简单地将一个多字节字符串(它从标准输入获得)转换为一个宽字符串。从 cppreference阅读mbsrtowcs和文档后,我认为它是有效的:mbstate_t

#include <stdio.h>
#include <wchar.h>
#include <errno.h>
#include <stdlib.h>
#include <error.h>

int main()
{
        char *s = NULL; size_t n = 0; errno = 0;
        ssize_t sn = getline(&s, &n, stdin);
        if(sn == -1 && errno != 0)
                error(EXIT_FAILURE, errno, "getline");
        if(sn == -1 && errno == 0) // EOF
                return EXIT_SUCCESS;

        // determine how big should be the allocated buffer
        const char* cs = s; mbstate_t st = {0}; // cs to avoid comp. warnings
        size_t wn = mbsrtowcs(NULL, &cs, 0, &st);
        if(wn == (size_t)-1)
                error(EXIT_FAILURE, errno, "first mbsrtowcs");

        wchar_t* ws = malloc((wn+1) * sizeof(wchar_t));
        if(ws == NULL)
                error(EXIT_FAILURE, errno, "malloc");

        // finally convert the multibyte string to wide string
        st = (mbstate_t){0};
        if(mbsrtowcs(ws, &cs, wn+1, &st) == (size_t)-1)
                error(EXIT_FAILURE, errno, "second mbsrtowcs");

        if(printf("%ls", ws) < 0)
                error(EXIT_FAILURE, errno, "printf");

        return EXIT_SUCCESS;
}

是的,这适用于 ASCII 字符串。但我试图处理非 ASCII 字符串的原因是我想支持 ASCII 表之外的变音符号!它对那些失败了。第一次调用mbsrtowcs失败并EILSEQ显示 ,这表明多字节字符串无效。但奇怪的是,用 来检查它gdb,它似乎是有效的!(只要gdb正确显示)。请查看将此片段提供非 ASCII 字符串并gdb在下面输入的效果:

m@m-X555LJ:~/wtfdir$ gcc -g -o wtf wtf.c
m@m-X555LJ:~/wtfdir$ ./wtf
asa
asa
m@m-X555LJ:~/wtfdir$ ./wtf
ąsa
./wtf: first mbsrtowcs: Invalid or incomplete multibyte or wide character
m@m-X555LJ:~/wtfdir$ gdb ./wtf
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./wtf...done.
(gdb) break 18
Breakpoint 1 at 0x93b: file wtf.c, line 18.
(gdb) r
Starting program: /home/m/wtfdir/wtf 
ąsa

Breakpoint 1, main () at wtf.c:18
18          size_t wn = mbsrtowcs(NULL, &cs, 0, &st);
(gdb) p cs
$1 = 0x555555756260 "ąsa\n"
(gdb) c
Continuing.
/home/m/wtfdir/wtf: first mbsrtowcs: Invalid or incomplete multibyte or wide character
[Inferior 1 (process 5612) exited with code 01]
(gdb) quit

如果这很重要,我在 Linux 上,并且语言环境编码似乎是 UTF8:

m@m-X555LJ:~$ locale charmap
UTF-8

(这就是为什么我希望它能够工作,像printf("ąsa\n");在 Linux 上对我有用但在 Windows 上却不行的琐碎程序)

我错过了什么?我究竟做错了什么?

4

0 回答 0