2

I have read through the C11 standard, section 7.21 where <stdio.h> is described. The standard first describes streams as:

7.21.2.2:

A text stream is an ordered sequence of characters ...

7.21.2.3:

A binary stream is an ordered sequence of characters ...

Which doesn't specify the type of the stream characters (since this depends on orientation). It later says:

7.21.3.12:

... The byte output functions write characters to the stream as if by successive calls to the fputc function.

From fputc (7.21.7.3.2):

The fputc function writes the character specified by c (converted to an unsigned char) to the output stream pointed to by stream ...

Which indicates the int c argument of fputc is converted to an unsigned char before being written to the stream. A similar note is given for fgetc:

7.21.7.1.2:

the fgetc function obtains that character as an unsigned char converted to an int

and ungetc, fread and fwrite.

Now this all hints that internally, a byte oriented stream is represented by unsigned chars.

However, looking at the internals of the Linux kernel, it seems like files are considered to be streams of char. One reason I am saying this is that the file_operations read and write callbacks get char __user * and const char __user * respectively.

In the implementation of glibc, FILE is a typedef of struct _IO_FILE which is defined in libio/libio.h. In this struct also, all read and write pointers are char *.

In C++, the basic_ostream::write function takes const char * as input and similarly basic_istream::read (but I'm not interested in C++ in this question).

My question is, do the quotes above imply that FILE streams should be threated as streams of unsigned char? If so, why does the glibc and the Linux kernel implement them with char *? If not, why does the standard insist on converting the characters to unsigned char?

4

3 回答 3

4

这并不重要。标准在某些选定的地方使用 unsigned char,因为它允许在这些地方进行精确的公式化:

  • fgetc指定返回一个转换为 int 的无符号字符,以便知道结果是正数或空值,当它是 EOF 时除外(因此 EOF 和有效字符之间不可能混淆,混淆是导致错误的原因之一将 fgetc 的结果直接存储在 char 中,而不事先检查 EOF)。

  • fputc指定为采用 int 并将其转换为 unsigned char,因为此转换已明确指定。如果您不小心,不使用 unsigned char 的公式可能会使 UB 像

    int c = fgetc(stdin);
    if (c != EOF)
        fputc(c, stdout);
    

带符号的字符用于负字符。

于 2012-09-23T18:11:09.377 回答
2

It doesn't really matter. A char is CHAR_BIT bits long (limits.h - usually 8 bits), whether it's signed or not.

Those functions work with CHAR_BIT bits chunks, so the sign does not make a difference here, for the writing or reading process.

You may then use signed or unsigned chars, depending on your application logic, by appropriately casting the result. Human representation will be different, depending on the sign, but for the processor, the representation does not change. It's still bytes.

于 2012-09-23T17:52:41.833 回答
1

您唯一可以直接观察(无需检查来源)的是 API 返回的内容。它背后的任何东西都被黑盒抽象所隐藏,不应该是你关心的问题。

关于您问题的另一部分:标准必须注意,存在转换,因为参数/返回值是int并且流是字符序列。

于 2012-09-23T18:02:36.260 回答