floating-point - 用花车换档

Question

float a = 1.0 + ((float) (1 << 25))
float b = 1.0 + ((float) (1 << 26))
float c = 1.0 + ((float) (1 << 27))

运行此代码后，a、b 和 c 的浮点值是多少？解释为什么 a、b 和 c 的位布局会导致每个值都变成原来的样子。

score 1 · Accepted Answer

运行此代码后，a、b 和 c 的浮点值是多少？

当int是 32 位时，以下整数移位定义明确且准确。代码没有移动float @EOF。

// OK with 32-bit int
1 << 25
1 << 26
1 << 27

转换为float，上述 2 的幂值，也被很好地定义，没有精度损失。

// OK and exact
(float) (1 << 25)
(float) (1 << 26)
(float) (1 << 27)

将这些添加到double1.0 是定义明确的精确总和。一个典型double的有效数为 53 位，可以0x8000001.0p0精确表示。例如：DBL_MANT_DIG == 53

// Let us use hexadecimal FP notation
1.0 + ((float) (1 << 25))  // 0x2000001.0p0 or 0x1.0000008p+25
1.0 + ((float) (1 << 26))  // 0x4000001.0p0 or 0x1.0000004p+26
1.0 + ((float) (1 << 27))  // 0x8000001.0p0 or 0x1.0000002p+27

最后，代码尝试将double值分配给 a float，虽然在典型编码的范围内float，但不能准确地表示这些值。

典型的float有效数字为 24 位。例如：FLT_MANT_DIG == 24

如果要转换的值在可以表示但不能准确表示的值范围内，则结果是最接近的较高或最近的较低可表示值，以实现定义的方式选择。C17dr § 6.3.1.4 2。

一个典型的实现定义的方式四舍五入到最接近，平到偶数。

  float s = 0x0800001.0p0; printf("%a\n", s);
  float t = 0x1000001.0p0; printf("%a\n", t);// 0x1000001.0p0 1/2 way between two floats 
  float a = 0x2000001.0p0; printf("%a\n", a);
  float b = 0x4000001.0p0; printf("%a\n", b);
  float c = 0x8000001.0p0; printf("%a\n", c);

输出

0x1.000002p+23   // exact conversion double to float
0x1p+24          
0x1p+25
0x1p+26
0x1p+27

解释为什么 a、b 和 c 的位布局会导致每个值都变成原来的样子。

位布局不是问题。它是floatwith 的属性FLT_MANT_DIG == 24，一个 24 位有效数和实现定义的行为，导致值四舍五入double到附近的值float。任何float布局FLT_MANT_DIG == 24都会有类似的结果。

floating-point - 用花车换档

1 回答 1

Related

Reference