请看我写的这个片段,它应该简单地将一个多字节字符串(它从标准输入获得)转换为一个宽字符串。从 cppreference阅读mbsrtowcs
和文档后,我认为它是有效的:mbstate_t
#include <stdio.h>
#include <wchar.h>
#include <errno.h>
#include <stdlib.h>
#include <error.h>
int main()
{
char *s = NULL; size_t n = 0; errno = 0;
ssize_t sn = getline(&s, &n, stdin);
if(sn == -1 && errno != 0)
error(EXIT_FAILURE, errno, "getline");
if(sn == -1 && errno == 0) // EOF
return EXIT_SUCCESS;
// determine how big should be the allocated buffer
const char* cs = s; mbstate_t st = {0}; // cs to avoid comp. warnings
size_t wn = mbsrtowcs(NULL, &cs, 0, &st);
if(wn == (size_t)-1)
error(EXIT_FAILURE, errno, "first mbsrtowcs");
wchar_t* ws = malloc((wn+1) * sizeof(wchar_t));
if(ws == NULL)
error(EXIT_FAILURE, errno, "malloc");
// finally convert the multibyte string to wide string
st = (mbstate_t){0};
if(mbsrtowcs(ws, &cs, wn+1, &st) == (size_t)-1)
error(EXIT_FAILURE, errno, "second mbsrtowcs");
if(printf("%ls", ws) < 0)
error(EXIT_FAILURE, errno, "printf");
return EXIT_SUCCESS;
}
是的,这适用于 ASCII 字符串。但我试图处理非 ASCII 字符串的原因是我想支持 ASCII 表之外的变音符号!它对那些失败了。第一次调用mbsrtowcs
失败并EILSEQ
显示 ,这表明多字节字符串无效。但奇怪的是,用 来检查它gdb
,它似乎是有效的!(只要gdb
正确显示)。请查看将此片段提供非 ASCII 字符串并gdb
在下面输入的效果:
m@m-X555LJ:~/wtfdir$ gcc -g -o wtf wtf.c
m@m-X555LJ:~/wtfdir$ ./wtf
asa
asa
m@m-X555LJ:~/wtfdir$ ./wtf
ąsa
./wtf: first mbsrtowcs: Invalid or incomplete multibyte or wide character
m@m-X555LJ:~/wtfdir$ gdb ./wtf
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./wtf...done.
(gdb) break 18
Breakpoint 1 at 0x93b: file wtf.c, line 18.
(gdb) r
Starting program: /home/m/wtfdir/wtf
ąsa
Breakpoint 1, main () at wtf.c:18
18 size_t wn = mbsrtowcs(NULL, &cs, 0, &st);
(gdb) p cs
$1 = 0x555555756260 "ąsa\n"
(gdb) c
Continuing.
/home/m/wtfdir/wtf: first mbsrtowcs: Invalid or incomplete multibyte or wide character
[Inferior 1 (process 5612) exited with code 01]
(gdb) quit
如果这很重要,我在 Linux 上,并且语言环境编码似乎是 UTF8:
m@m-X555LJ:~$ locale charmap
UTF-8
(这就是为什么我希望它能够工作,像printf("ąsa\n");
在 Linux 上对我有用但在 Windows 上却不行的琐碎程序)
我错过了什么?我究竟做错了什么?