2

下面的简单代码应该从标准输入读取一个宽字符并将其回显到标准输出,除了它SIGSEGViconv()调用时死亡。问题是——代码有什么问题?

#include <unistd.h>   /* STDIN_FILENO */
#include <locale.h>   /* LC_ALL, setlocale() */
#include <langinfo.h> /* nl_langinfo(), CODESET */
#include <wchar.h>    /* wchar_t, putwchar() */
#include <iconv.h>    /* iconv_t, iconv_open(), iconv(), iconv_close() */
#include <stdlib.h>   /* malloc(), EXIT_SUCCESS */

int main(void) {
  setlocale(LC_ALL, "");                                            // We initialize the locale
  iconv_t converter = iconv_open("WCHAR_T", nl_langinfo(CODESET));  // We initialize a converter
  wchar_t out;                                                      // We allocate memory for one wide char on stack
  wchar_t* pOut = &out;
  size_t outLeft = sizeof(wchar_t); 

  while(outLeft > 0) {                                              // Until we've read one wide char...
    char in;                                                        // We allocate memory for one byte on stack
    char* pIn=&in;
    size_t inLeft = 1;

    if(read(STDIN_FILENO, pIn, 1) == 0) break;                      // We read one byte from stdin to the buffer
    iconv(&converter, &pIn, &inLeft, (char**)&pOut, &outLeft);      // We feed the byte to the converter
  }

  iconv_close(converter);                                           // We deinitialize a converter
  putwchar(out);                                                    // We echo the wide char back to stdout
  return EXIT_SUCCESS;
}

更新:根据@gsg 的回答进行以下更新后:

iconv(converter, &pIn, &inLeft, &pOut, &outLeft);

该代码不再抛出 SIGSEGV,而是out == L'\n'针对任何非 ASCII 输入。

4

2 回答 2

3

The signature of iconv is

size_t iconv(iconv_t cd,
             char **inbuf, size_t *inbytesleft,
             char **outbuf, size_t *outbytesleft);

But you call it with a first argument of pointer to iconv_t:

iconv(&converter, &pIn, &inLeft, (char**)&pOut, &outLeft);

Which should be

iconv(converter, &pIn, &inLeft, (char**)&pOut, &outLeft);

An interesting question is why a warning is not generated. For that, let's look at the definition in iconv.h:

/* Identifier for conversion method from one codeset to another.  */
typedef void *iconv_t;

That's an... unfortunate choice.

I would program this a bit differently:

#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <unistd.h>
#include <locale.h>
#include <langinfo.h>
#include <wchar.h>
#include <iconv.h>
#include <stdlib.h>
#include <err.h>

int main(void)
{
    iconv_t converter;
    char input[8]; /* enough space for a multibyte char */
    wchar_t output[8];
    char *pinput = input;
    char *poutput = (char *)&output[0];
    ssize_t bytes_read;
    size_t error;
    size_t input_bytes_left, output_bytes_left;

    setlocale(LC_ALL, "");

    converter = iconv_open("WCHAR_T", nl_langinfo(CODESET));
    if (converter == (iconv_t)-1)
        err(2, "failed to alloc conv_t");

    bytes_read = read(STDIN_FILENO, input, sizeof input);
    if (bytes_read <= 0)
        err(2, "bad read");
    input_bytes_left = bytes_read;
    output_bytes_left = sizeof output;

    error = iconv(converter,
                  &pinput, &input_bytes_left,
                  &poutput, &output_bytes_left);
    if (error == (size_t)-1)
        err(2, "failed conversion");

    printf("%lc\n", output[0]);

    iconv_close(converter);
    return EXIT_SUCCESS;
}
于 2013-11-04T09:58:50.430 回答
0

我绝不是专家,但这是一个遵循您似乎正在尝试做的示例:

http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html

从网站:

该示例还显示了在 iconv 中使用宽字符串的问题。正如上面对 iconv 函数的描述中所解释的,该函数始终采用指向 char 数组的指针,并且可用空间以字节为单位。在示例中,输出缓冲区是一个宽字符缓冲区;因此,我们使用 char * 类型的局部变量 wrptr,它在 iconv 调用中使用。

这看起来很无辜,但可能会在对对齐有严格限制的平台上导致问题。因此 iconv 的调用者必须确保传递的指针适合访问来自适当字符集的字符。由于在上述情况下,函数的输入参数是 wchar_t 指针,因此情况就是这样(除非用户在计算参数时违反了对齐方式)。但在其他情况下,特别是在编写不知道使用哪种类型的字符集并因此将文本视为字节序列的通用函数时,它可能会变得棘手。

本质上,与 iconv 对齐存在问题。事实上,已经列出了一些关于这个问题的错误:

http://lists.debian.org/debian-glibc/2007/02/msg00043.html

希望这至少能让你开始。我会尝试对 pOut 使用 char* 而不是 wchar_t*,如示例所示。

于 2013-11-04T09:32:18.690 回答