我想编写一个函数来将 UTF8 字符串转换为 UTF16(小端序)。问题是,该iconv
函数似乎并没有让您提前知道存储输出字符串需要多少字节。
我的解决方案是首先分配2*strlen(utf8)
,然后在循环中运行,必要时iconv
增加该缓冲区的大小:realloc
static int utf8_to_utf16le(char *utf8, char **utf16, int *utf16_len)
{
iconv_t cd;
char *inbuf, *outbuf;
size_t inbytesleft, outbytesleft, nchars, utf16_buf_len;
cd = iconv_open("UTF16LE", "UTF8");
if (cd == (iconv_t)-1) {
printf("!%s: iconv_open failed: %d\n", __func__, errno);
return -1;
}
inbytesleft = strlen(utf8);
if (inbytesleft == 0) {
printf("!%s: empty string\n", __func__);
iconv_close(cd);
return -1;
}
inbuf = utf8;
utf16_buf_len = 2 * inbytesleft; // sufficient in many cases, i.e. if the input string is ASCII
*utf16 = malloc(utf16_buf_len);
if (!*utf16) {
printf("!%s: malloc failed\n", __func__);
iconv_close(cd);
return -1;
}
outbytesleft = utf16_buf_len;
outbuf = *utf16;
nchars = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
while (nchars == (size_t)-1 && errno == E2BIG) {
char *ptr;
size_t increase = 10; // increase length a bit
size_t len;
utf16_buf_len += increase;
outbytesleft += increase;
ptr = realloc(*utf16, utf16_buf_len);
if (!ptr) {
printf("!%s: realloc failed\n", __func__);
free(*utf16);
iconv_close(cd);
return -1;
}
len = outbuf - *utf16;
*utf16 = ptr;
outbuf = *utf16 + len;
nchars = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
}
if (nchars == (size_t)-1) {
printf("!%s: iconv failed: %d\n", __func__, errno);
free(*utf16);
iconv_close(cd);
return -1;
}
iconv_close(cd);
*utf16_len = utf16_buf_len - outbytesleft;
return 0;
}
这真的是最好的方法吗?重复realloc
s 似乎很浪费,但不知道 utf8 中可能包含哪些字符序列,以及它们在 utf16 中会产生什么结果,我不知道是否可以对初始缓冲区大小做出比2*strlen(utf8)
.