为什么浮点数更适合精度?不能用非常大的整数来表示 float 给出的精度并且在所有机器上都具有确定性吗?例如,以浮点数移动 0.48124 米的对象可以改为以 int 或 long 移动 48124 微米的对象来表示。
5 回答
Floating-point is preferred over integers for some calculations because:
- When you multiply in a fixed-point format, the product has a new scale, so it must be adjusted or the code must be written to account for the changed scale. For example, if you adopt a format scaled by 100, so that .3 is represented with 30 and .4 is represented with 40, then multiplying 30 by 40 produces 1200, but correct answer at the same scale should be 12 (representing .12). Division needs similar adjustment.
- When the integer format overflows, many machines and programming languages do not have good support for getting the most significant portion of the result. Floating-point automatically produces the most significant portion of the result and rounds the discarded bits.
- Integer arithmetic usually truncates fractions, but floating-point rounds them (unless requested otherwise).
- Some calculations involve a large range of numbers, including both numbers that are very large and very small. A fixed-point format has a small range, but a floating-point format has a large range. You could manually track the scale with a fixed-point format, but then you are merely implementing your own floating-point using integers.
- Many machines and/or programming languages ignore integer overflow, but floating-point can handle these gracefully and/or provide notifications when they occur.
- Floating-point arithmetic is well defined and generally well implemented; bugs in it have been reduced (sometimes by painful experience). Building new do-it-yourself arithmetic is prone to bugs.
- For some functions, it is difficult to predict the scale of the result in advance, so it is awkward to use a fixed-point format. For example, consider sine. Whenever the input is near a multiple of π, sine is near zero. Because π is irrational (and transcendental), the pattern of which integers or fixed-point numbers are near multiples of π is very irregular. Some fixed-point numbers are not near multiples of π, and their sines are around .1, .5, .9, et cetera. Some fixed-point numbers are very near multiples of π, and their sines are close to zero. A few are very close to multiples of π, and their sines are tiny. Because of this, there is no fixed-point format of reasonable precision that can always return the result of sine without either underflowing or overflowing.
You asking about floating-point versus long
. A 64-bit integer might have advantages over a 32-bit floating-point format in some situations, but often the proper comparison is for comparable sizes, such as 32-bit integer to 32-bit floating-point and 64-bit integer to 64-bit floating-point. In these cases, the question is whether the benefits of a dynamic scale outweigh the loss of a few bits of precision.
它将是 481.24 毫米,这是问题出现的一部分。使用整数(或长整数),您很可能会遇到某种舍入的情况。也许你的程序会保证你关心的最小单位是毫米,但它仍然会导致写单位的标准有点难看。不难计算出 100000 毫米 == 100 米,但它并不像 100.000 那样立即显而易见,而且在您可能主要处理米或公里但仍需要精度的应用程序中,这要多得多比 3463.823 读起来更烦人。
此外,在很多情况下,您确实关心超出不方便的小尺寸,虽然使用浮点数可以修剪显示的位数,但数据仍然存在,所以 3.141592653(等等,直到浮动点精度是)修剪到 3.14 米比 3141592653 纳米更容易处理
实际上,在某些应用中,出于多种原因,整数是首选的。特别是,与浮点不同,整数是平移不变的:
x1 - x2 == (x1 - displacement) - (x2 - displacement)
这在一些几何引擎中非常重要。例如,如果您正在计算由某个参数确定的相同形状的大网格,那么您计算相同参数的集合,并为每个集合计算其代表中发生的情况,并将结果复制到具有相同参数的其他形状。平移不变性确保这种优化是忠实的。
另一方面,浮点不是平移不变的:
0.0002 - 0.0001 != (0.0002 - 1000000) - (0.0002 - 1000000) // this example in single precision
这有时会导致难以调试的非常令人讨厌的意外。
确定性行为的程度与数据表示无关。与整数数学相比,精确定义浮点数学只需要更长的规范,并且实现起来更加混乱。
IEEE 浮点努力使浮点在所有机器上都具有确定性。
整数可以是 1 或 2 的补码和各种宽度,因此对于某些计算来说不是确定性的。所以整数数学本身就很麻烦。
是的,大整数可以并且已经按照 OP 的建议使用。但正如@Eric Postpischil 指出的那样,FP 的好处很多。大整数用于特定情况,包括密码学。
注意即将推出的十进制浮点标准,以处理银行等问题。
尽管许多类型的代码可以比浮点更有效地使用某种固定值,但不同类型的代码和不同的情况有不同的要求。有些情况需要存储从零到一百万的数字,精确到千分之一;其他需要存储从零到一精确到十亿分之一的数字。对某些目的来说勉强满足的定点格式对于其他目的来说将是非常过分的。如果一种语言可以有效地支持处理以各种不同格式存储的数字,那么定点数学可以具有一些非常巨大的优势。另一方面,一种语言通常更容易支持一到三种浮点格式,而不是支持许多原本需要的定点格式。更远,除非一种语言使单个例程可以处理各种定点格式,否则使用通用数学例程往往很困难。也许编译器技术已经发展到足以使用各种定点类型可能是实用的,但硬件浮点技术已经足够先进,在很大程度上消除了对这种东西的需求。