c - 逐字节打印 4 字节整数时的意外行为

Question

我有这个将 32 位整数转换为 ip 地址的示例代码。


#include <stdio.h>
int main()
{
 unsigned int c ;
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Integer value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

此代码为 input 提供了不正确的输出2249459722。但是当我更换

scanf("%d",&c) ;

经过

scanf("%u",&c) ;

输出结果是正确的。

PS：我知道inet_ntopand inet_pton。
除了提出建议之外，我还期待其他答案。

score 13 · Accepted Answer

You are coding 'sinfully' (making a number of mistakes which will hurt you sooner or later - mostly sooner). First off, you are assuming that the integer is of the correct endian-ness. On some machines, you will be wrong - either on Intel machines or on PowerPC or SPARC machines.

In general, you should show the actual results you get rather than just saying that you get the wrong result; you should also show the expected result. That helps people debug your expectations.

Here's my modified version of your code - instead of requesting input, it simply assumes the value you specified.

#include <stdio.h>
int main(void)
{
    unsigned int c = 2249459722;
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
    return(0);
}

When compiled on my Mac (Intel, little-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

When compiled on my Sun (SPARC, big-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 134.20.8.10

(Using GCC 4.4.2 on the SPARC, I get a warning:

xx.c:4: warning: this decimal constant is unsigned only in ISO C90

Using GCC 4.2.1 on Mac - with lots of warnings enabled (gcc -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Werror) - I don't get that warning, which is interesting.) I can remove that by adding a U suffix to the integer constant.

Another way of looking at the problems is illustrated with the following code and the extremely fussy compiler settings shown above:

#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 0)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}

This fails to compile (because of the -Werror setting) with the message:

cc1: warnings being treated as errors
xx.c: In function ‘main’:
xx.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’

Remove the -Werror setting and it compiles, but then shows the next problem that you have - the one of not checking for error indications from functions that can fail:

Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Conversion failed for 2249459722

Basically, the sscanf() function reports that it failed to convert the string to a signed integer (because the value is too large to fit - see the warning from GCC 4.4.2), but your code was not checking for the error return from sscanf(), so you were using whatever value happened to be left in c at the time.

So, there are multiple problems with your code:

It assumes a particular architecture (little-endian rather than recognizing that big-endian also exists).
It doesn't compile cleanly when using a compiler with lots of warnings enabled - for good reason.
It doesn't check that functions that can fail actually succeeded.

Alok's Comment

Yes, the test on sscanf() is wrong. That's why you have code reviews, and also why it helps to post the code you are testing.

I'm now a bit puzzled - getting consistent behaviour that I can't immediately explain. With the obvious revision (testing on MacOS X 10.6.2, GCC 4.2.1, 32-bit and 64-bit compilations), I get one not very sane answer. When I rewrite more modularly, I get a sane answer.

+ cat yy.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 1)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}


+ gcc -o yy.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

I do not have a good explanation for the value 170.142.21.134; but it is consistent on my machine, at the moment.

+ gcc -o yy.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

Same value - even in 64-bit instead of 32-bit. Maybe the problem is that I'm trying to explain undefined behaviour, which is more or less by definition unexplainable (inexplicable).

+ cat xx.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

static void scan_value(const char *str, const char *fmt, const char *tag)
{
    unsigned int c;
    printf("Indirect operations (%s):\n", tag);
    fmt = "%d";
    if (sscanf(str, fmt, &c) != 1)
        printf("Conversion failed for %s (format %s \"%s\")\n", str, tag, fmt);
    else
        print_value(c);
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722U;

    printf("Direct operations:\n");
    print_value(c);
    scan_value(str, "%d", "signed");
    scan_value(str, "%u", "unsigned");

    return(0);
}

Using the function argument like this means GCC cannot spot the bogus format any more.

+ gcc -o xx.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

The results are consistent here.

+ gcc -o xx.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

And these are the same as the 32-bit case. I'm officially bemused. The main observations remain accurate - be careful, heed compiler warnings (and elicit compiler warnings), and don't assume that "all the world runs on Intel chips" (it used to be "don't assume that all the world is a VAX", once upon a long time ago!).

score 5 · Accepted Answer

%d 用于有符号整数

%u 用于无符号整数

编辑：

请按如下方式修改您的程序，以查看您的输入是如何被真正解释的：

#include <stdio.h>
int main()
{
 unsigned int c ; 
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Signed value: %d\n",c);
  printf("Unsigned value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

当您提供一个大于 INT_MAX 的数字时会发生什么，最左边的位是 1。这表明它是一个带负值的有符号整数。然后将该数字解释为它的二进制补码

score 1 · Accepted Answer

要回答您的主要问题：

scanf("%d", &c);

scanf()当被转换的输入不能表示为数据类型时，的行为是未定义的。 2249459722在您的机器上不适合 . int，因此scanf()可以做任何事情，包括将垃圾存储在c.

在 C 中，inttype 保证能够将值存储在 to 范围-32767内+32767。An是和unsigned int之间的保证值。因此，即使是. 但是，可以存储高达(2 ³² −1) 的值，因此您应该使用：0655352249459722unsigned intunsigned long4294967295unsigned long

#include <stdio.h>
int main()
{
    unsigned long c ;
    unsigned char *cptr  = (unsigned char*)&c ;
    while(1)
    {
        if (scanf("%lu", &c) != 1) {
            fprintf(stderr, "error in scanf\n");
            return 0;
        }
        printf("Input value: %lu\n", c);
        printf("%u.%u.%u.%u\n", cptr[0], cptr[1], cptr[2], cptr[3]);
    }
    return 0;
}

如果您有 C99 编译器，则可以#include <inttypes.h>然后使用uint32_t而不是unsigned long. 通话scanf()变成scanf("%" SCNu32, &c);

score 1 · Accepted Answer

写这个的正确的字节顺序安全方法是

printf("Dotted decimal: %u.%u.%u.%u \n", (c >> 24) & 0xff, (c >> 16) & 0xff, (c >> 8) & 0xff, (c >> 0) & 0xff);

c - 逐字节打印 4 字节整数时的意外行为

4 回答 4

Alok's Comment

编辑：

Related

Reference