有一个FLT_MIN
最接近于零的常数。如何最接近some number
价值?
举个例子:
float nearest_to_1000 = 1000.0f + epsilon;
// epsilon must be the smallest value satisfying condition:
// nearest_to_1000 > 1000.0f
我更喜欢不使用特殊函数的数字公式。
有一个FLT_MIN
最接近于零的常数。如何最接近some number
价值?
举个例子:
float nearest_to_1000 = 1000.0f + epsilon;
// epsilon must be the smallest value satisfying condition:
// nearest_to_1000 > 1000.0f
我更喜欢不使用特殊函数的数字公式。
<math.h>
C 在标题中为此提供了一个函数。nextafterf(x, INFINITY)
是 之后的下一个可表示值x
,朝着 的方向INFINITY
。
但是,如果您更愿意自己做:
假设 IEEE 754,以下返回您寻找的 epsilon,用于单精度(浮点)。请参阅底部有关使用库例程的说明。
#include <float.h>
#include <math.h>
/* Return the ULP of q.
This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
05.12_, Faculty for Information and Communication Sciences, Hamburg
University of Technology, November 13, 2005.
*/
float ULP(float q)
{
// SmallestPositive is the smallest positive floating-point number.
static const float SmallestPositive = FLT_EPSILON * FLT_MIN;
/* Scale is .75 ULP, so multiplying it by any significand in [1, 2) yields
something in [.75 ULP, 1.5 ULP) (even with rounding).
*/
static const float Scale = 0.75f * FLT_EPSILON;
q = fabsf(q);
/* In fmaf(q, -Scale, q), we subtract q*Scale from q, and q*Scale is
something more than .5 ULP but less than 1.5 ULP. That must produce q
- 1 ULP. Then we subtract that from q, so we get 1 ULP.
The significand 1 is of particular interest. We subtract .75 ULP from
q, which is midway between the greatest two floating-point numbers less
than q. Since we round to even, the lesser one is selected, which is
less than q by 1 ULP of q, although 2 ULP of itself.
*/
return fmaxf(SmallestPositive, q - fmaf(q, -Scale, q));
}
以下返回在传递的值之后以浮点数表示的下一个值(将 -0 和 +0 视为相同)。
#include <float.h>
#include <math.h>
/* Return the next floating-point value after the finite value q.
This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
05.12_, Faculty for Information and Communication Sciences, Hamburg
University of Technology, November 13, 2005.
*/
float NextAfterf(float q)
{
/* Scale is .625 ULP, so multiplying it by any significand in [1, 2)
yields something in [.625 ULP, 1.25 ULP].
*/
static const float Scale = 0.625f * FLT_EPSILON;
/* Either of the following may be used, according to preference and
performance characteristics. In either case, use a fused multiply-add
(fmaf) to add to q a number that is in [.625 ULP, 1.25 ULP]. When this
is rounded to the floating-point format, it must produce the next
number after q.
*/
#if 0
// SmallestPositive is the smallest positive floating-point number.
static const float SmallestPositive = FLT_EPSILON * FLT_MIN;
if (fabsf(q) < 2*FLT_MIN)
return q + SmallestPositive;
return fmaf(fabsf(q), Scale, q);
#else
return fmaf(fmaxf(fabsf(q), FLT_MIN), Scale, q);
#endif
}
使用了库例程,但fmaxf
(其参数的最大值)和fabsf
(绝对值)很容易替换。fmaf
应该编译为具有融合乘加的架构上的硬件指令。否则,fmaf(a, b, c)
在此使用中可以替换为(double) a * b + c
. (IEEE-754 binary64 有足够的范围和精度来替换fmaf
. 其他选择double
可能没有。)
fused-multiply add 的另一种替代方法是为不正常的情况添加一些测试q * Scale
并单独处理这些情况。对于其他情况,乘法和加法可以用普通的*
和+
运算符分别进行。