我正在尝试创建一个以高精度存储股票价格的应用程序。目前我正在使用双重这样做。为了节省内存,我可以使用任何其他数据类型吗?我知道这与定点算术有关,但我无法弄清楚。
4 回答
定点算术背后的想法是,您存储乘以一定数量的值,将乘积的值用于所有微积分,并在需要结果时将其除以相同的数量。这种技术的目的是在能够表示分数的同时使用整数算术(int、long ...)。
在 C 中执行此操作的常用且最有效的方法是使用位移运算符(<< 和 >>)。移位对于 ALU 来说是一种非常简单和快速的操作,并且这样做具有在每次移位时将整数值乘以 (<<) 和除以 (>>) 2 的属性(此外,可以对完全相同的值进行多次移位)单价)。当然,缺点是乘数必须是 2 的幂(这本身通常不是问题,因为我们并不真正关心确切的乘数值)。
现在假设我们想使用 32 位整数来存储我们的值。我们必须选择 2 的乘数。让我们把蛋糕一分为二,比如 65536(这是最常见的情况,但您可以根据您的精度需求真正使用 2 的任何幂)。这是 2 16,这里的 16 意味着我们将使用 16 个最低有效位 (LSB) 作为小数部分。其余 (32 - 16 = 16) 用于最高有效位 (MSB),即整数部分。
integer (MSB) fraction (LSB)
v v
0000000000000000.0000000000000000
让我们把它放在代码中:
#define SHIFT_AMOUNT 16 // 2^16 = 65536
#define SHIFT_MASK ((1 << SHIFT_AMOUNT) - 1) // 65535 (all LSB set, all MSB clear)
int price = 500 << SHIFT_AMOUNT;
这是您必须存储的值(结构、数据库等)。请注意, int 在 C 中不一定是 32 位,即使现在大多数情况下也是如此。同样无需进一步声明,默认情况下已签名。您可以将 unsigned 添加到声明中以确保。比这更好的是,如果您的代码高度依赖于整数位大小(您可能会引入一些关于它的技巧),您可以使用 uint32_t 或 uint_least32_t(在 stdint.h 中声明)。有疑问,为您的定点类型使用 typedef 会更安全。
当您想对该值进行微积分时,可以使用 4 个基本运算符:+、-、* 和 /。您必须记住,在添加和减去一个值(+ 和 -)时,该值也必须移动。假设我们想在 500 的价格上加 10:
price += 10 << SHIFT_AMOUNT;
但是对于乘法和除法(* 和 /),乘数/除数不得移动。假设我们要乘以 3:
price *= 3;
现在让我们通过将价格除以 4 来让事情变得更有趣,这样我们就可以弥补一个非零的小数部分:
price /= 4; // now our price is ((500 + 10) * 3) / 4 = 382.5
这就是规则。当您想在任何时候检索实际价格时,您必须右移:
printf("price integer is %d\n", price >> SHIFT_AMOUNT);
如果您需要小数部分,则必须将其屏蔽:
printf ("price fraction is %d\n", price & SHIFT_MASK);
当然,这个值并不是我们所说的小数,实际上是[0-65535]范围内的整数。但它与小数范围 [0 - 0.9999...] 完全对应。换句话说,映射看起来像:0 => 0, 32768 => 0.5, 65535 => 0.9999...
将其视为小数的一种简单方法是此时使用 C 内置的浮点运算:
printf("price fraction in decimal is %f\n", ((double)(price & SHIFT_MASK) / (1 << SHIFT_AMOUNT)));
但是,如果您没有 FPU 支持(硬件或软件),您可以像这样以完整的价格使用您的新技能:
printf("price is roughly %d.%lld\n", price >> SHIFT_AMOUNT, (long long)(price & SHIFT_MASK) * 100000 / (1 << SHIFT_AMOUNT));
表达式中 0 的数量大致是您想要的小数点后的位数。给定分数精度,不要高估 0 的数量(这里没有真正的陷阱,这很明显)。不要使用简单的 long,因为 sizeof(long) 可以等于 sizeof(int)。如果 int 为 32 位,则使用long long ,只要保证long long最小为 64 位(或使用 int64_t、int_least64_t 等,在 stdint.h 中声明)。换句话说,使用两倍于定点类型大小的类型,这很公平。最后,如果您无法访问 >= 64 位类型,也许是时候练习模拟它们了,至少对于您的输出而言。
这些是定点算术背后的基本思想。
小心使用负值。有时它会变得很棘手,尤其是在显示最终值的时候。此外,C 是关于有符号整数的实现定义的(即使现在存在这个问题的平台非常少见)。您应该始终在您的环境中进行最少的测试,以确保一切按预期进行。如果没有,如果你知道你在做什么,你可以绕过它(我不会对此进行开发,但这与算术移位与逻辑移位和 2 的补码表示有关)。但是,对于无符号整数,无论您做什么都是安全的,因为无论如何行为都是明确定义的。
另请注意,如果 32 位整数不能表示大于 2 32 - 1 的值,则使用 2 16的定点算术将您的范围限制为 2 16 - 1!(并用有符号整数将所有这些除以 2,在我们的示例中,可用范围为 2 15 - 1)。然后目标是选择适合情况的 SHIFT_AMOUNT。这是整数部分幅度和小数部分精度之间的折衷。
现在是真正的警告:这种技术绝对不适合精度是重中之重的领域(金融、科学、军事......)。通常的浮点 (float/double) 通常也不够精确,尽管它们总体上比定点具有更好的属性。无论值是多少,定点都具有相同的精度(在某些情况下这可能是一个优势),其中浮点精度与值大小成反比(即,大小越低,您获得的精度越高……嗯,这比这更复杂,但你明白了)。浮点数也比等效的(位数)整数(定点或非定点)具有更大的幅度,以损失高值的精度为代价(您甚至可以达到加 1 或什至的幅度点更大的值根本没有影响,
如果您在这些敏感领域工作,最好使用专用于任意精度的库(去看看gmplib,它是免费的)。从本质上讲,在计算科学中,获得精度与您用于存储值的位数有关。你想要高精度?使用位。就这样。
我看到你有两个选择。如果您在金融服务行业工作,您的代码可能应该遵守一些标准以确保精确度和准确性,因此您只需要遵守这些标准,而不管内存成本如何。我知道该业务通常资金充足,因此支付更多内存应该不是问题。:)
如果这是供个人使用,那么为了获得最大精度,我建议您在存储之前使用整数并将所有价格乘以一个固定因子。例如,如果您想要精确到一分钱(可能不够好),请将所有价格乘以 100,这样您的单位实际上是美分而不是美元,然后从那里开始。如果您想要更高的精度,请乘以更多。例如,要精确到百分之一(我听说的标准很常用),请将价格乘以 10000(100 * 100)。
现在使用 32 位整数,乘以 10000 几乎没有空间容纳大量美元。20 亿的实际 32 位限制意味着只能表示高达 20000 美元的价格:2000000000 / 10000 = 20000。如果将 20000 乘以某个值,情况会变得更糟,因为可能没有空间容纳结果。出于这个原因,我建议使用 64 位整数 ( long long
)。即使您将所有价格乘以 10000,仍然有足够的空间来容纳较大的值,即使是在乘法中也是如此。
定点的诀窍在于,无论何时进行计算,您都需要记住每个值实际上是一个基础值乘以一个常数。在添加或减去之前,您需要将具有较小常数的值相乘以匹配具有较大常数的值。乘法后,您需要除以某个值才能将结果乘以所需的常数。如果您使用 2 的非幂作为常数,则必须进行整数除法,这在时间上很昂贵。许多人使用 2 的幂作为常数,因此他们可以移动而不是除法。
如果这一切看起来很复杂,那就是。我认为最简单的选择是使用双打并在需要时购买更多 RAM。它们有 53 位精度,大约是 9 万亿,或几乎 16 个十进制数字。是的,当你与数十亿人一起工作时,你仍然可能会损失几分钱,但如果你关心这一点,那么你就不是以正确的方式成为亿万富翁了。:)
@Alex 在这里给出了一个绝妙的答案。但是,我想对他所做的事情进行一些改进,例如,通过演示如何进行仿真浮点(使用整数像浮点数一样)四舍五入到任何所需的小数位。我在下面的代码中证明了这一点。不过,我走得更远,最后写了一个完整的代码教程来教自己定点数学。这里是:
我的 fixed_point_math 教程:类似于教程的练习代码,用于学习如何进行定点数学、仅使用整数的手动“浮点”式打印、“浮点”式整数舍入以及大整数的小数定点数学。
如果你真的想学习定点数学,我认为这是值得仔细阅读的有价值的代码,但是我花了整个周末来编写,所以预计它可能会花费你几个小时来彻底完成所有内容。然而,四舍五入的基础知识可以在顶部找到,并且只需几分钟即可学会。
我在 GitHub 上的完整代码:https ://github.com/ElectricRCAircraftGuy/fixed_point_math 。
或者,下面(截断,因为堆栈溢出不允许那么多字符):
/*
fixed_point_math tutorial
- A tutorial-like practice code to learn how to do fixed-point math, manual "float"-like prints using integers only,
"float"-like integer rounding, and fractional fixed-point math on large integers.
By Gabriel Staples
www.ElectricRCAircraftGuy.com
- email available via the Contact Me link at the top of my website.
Started: 22 Dec. 2018
Updated: 25 Dec. 2018
References:
- https://stackoverflow.com/questions/10067510/fixed-point-arithmetic-in-c-programming
Commands to Compile & Run:
As a C program (the file must NOT have a C++ file extension or it will be automatically compiled as C++, so we will
make a copy of it and change the file extension to .c first):
See here: https://stackoverflow.com/a/3206195/4561887.
cp fixed_point_math.cpp fixed_point_math_copy.c && gcc -Wall -std=c99 -o ./bin/fixed_point_math_c fixed_point_math_copy.c && ./bin/fixed_point_math_c
As a C++ program:
g++ -Wall -o ./bin/fixed_point_math_cpp fixed_point_math.cpp && ./bin/fixed_point_math_cpp
*/
#include <stdbool.h>
#include <stdio.h>
#include <stdint.h>
// Define our fixed point type.
typedef uint32_t fixed_point_t;
#define BITS_PER_BYTE 8
#define FRACTION_BITS 16 // 1 << 16 = 2^16 = 65536
#define FRACTION_DIVISOR (1 << FRACTION_BITS)
#define FRACTION_MASK (FRACTION_DIVISOR - 1) // 65535 (all LSB set, all MSB clear)
// // Conversions [NEVERMIND, LET'S DO THIS MANUALLY INSTEAD OF USING THESE MACROS TO HELP ENGRAIN IT IN US BETTER]:
// #define INT_2_FIXED_PT_NUM(num) (num << FRACTION_BITS) // Regular integer number to fixed point number
// #define FIXED_PT_NUM_2_INT(fp_num) (fp_num >> FRACTION_BITS) // Fixed point number back to regular integer number
// Private function prototypes:
static void print_if_error_introduced(uint8_t num_digits_after_decimal);
int main(int argc, char * argv[])
{
printf("Begin.\n");
// We know how many bits we will use for the fraction, but how many bits are remaining for the whole number,
// and what's the whole number's max range? Let's calculate it.
const uint8_t WHOLE_NUM_BITS = sizeof(fixed_point_t)*BITS_PER_BYTE - FRACTION_BITS;
const fixed_point_t MAX_WHOLE_NUM = (1 << WHOLE_NUM_BITS) - 1;
printf("fraction bits = %u.\n", FRACTION_BITS);
printf("whole number bits = %u.\n", WHOLE_NUM_BITS);
printf("max whole number = %u.\n\n", MAX_WHOLE_NUM);
// Create a variable called `price`, and let's do some fixed point math on it.
const fixed_point_t PRICE_ORIGINAL = 503;
fixed_point_t price = PRICE_ORIGINAL << FRACTION_BITS;
price += 10 << FRACTION_BITS;
price *= 3;
price /= 7; // now our price is ((503 + 10)*3/7) = 219.857142857.
printf("price as a true double is %3.9f.\n", ((double)PRICE_ORIGINAL + 10)*3/7);
printf("price as integer is %u.\n", price >> FRACTION_BITS);
printf("price fractional part is %u (of %u).\n", price & FRACTION_MASK, FRACTION_DIVISOR);
printf("price fractional part as decimal is %f (%u/%u).\n", (double)(price & FRACTION_MASK) / FRACTION_DIVISOR,
price & FRACTION_MASK, FRACTION_DIVISOR);
// Now, if you don't have float support (neither in hardware via a Floating Point Unit [FPU], nor in software
// via built-in floating point math libraries as part of your processor's C implementation), then you may have
// to manually print the whole number and fractional number parts separately as follows. Look for the patterns.
// Be sure to make note of the following 2 points:
// - 1) the digits after the decimal are determined by the multiplier:
// 0 digits: * 10^0 ==> * 1 <== 0 zeros
// 1 digit : * 10^1 ==> * 10 <== 1 zero
// 2 digits: * 10^2 ==> * 100 <== 2 zeros
// 3 digits: * 10^3 ==> * 1000 <== 3 zeros
// 4 digits: * 10^4 ==> * 10000 <== 4 zeros
// 5 digits: * 10^5 ==> * 100000 <== 5 zeros
// - 2) Be sure to use the proper printf format statement to enforce the proper number of leading zeros in front of
// the fractional part of the number. ie: refer to the "%01", "%02", "%03", etc. below.
// Manual "floats":
// 0 digits after the decimal
printf("price (manual float, 0 digits after decimal) is %u.",
price >> FRACTION_BITS); print_if_error_introduced(0);
// 1 digit after the decimal
printf("price (manual float, 1 digit after decimal) is %u.%01lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 10 / FRACTION_DIVISOR);
print_if_error_introduced(1);
// 2 digits after decimal
printf("price (manual float, 2 digits after decimal) is %u.%02lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 100 / FRACTION_DIVISOR);
print_if_error_introduced(2);
// 3 digits after decimal
printf("price (manual float, 3 digits after decimal) is %u.%03lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 1000 / FRACTION_DIVISOR);
print_if_error_introduced(3);
// 4 digits after decimal
printf("price (manual float, 4 digits after decimal) is %u.%04lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 10000 / FRACTION_DIVISOR);
print_if_error_introduced(4);
// 5 digits after decimal
printf("price (manual float, 5 digits after decimal) is %u.%05lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 100000 / FRACTION_DIVISOR);
print_if_error_introduced(5);
// 6 digits after decimal
printf("price (manual float, 6 digits after decimal) is %u.%06lu.",
price >> FRACTION_BITS, (uint64_t)(price & FRACTION_MASK) * 1000000 / FRACTION_DIVISOR);
print_if_error_introduced(6);
printf("\n");
// Manual "floats" ***with rounding now***:
// - To do rounding with integers, the concept is best understood by examples:
// BASE 10 CONCEPT:
// 1. To round to the nearest whole number:
// Add 1/2 to the number, then let it be truncated since it is an integer.
// Examples:
// 1.5 + 1/2 = 1.5 + 0.5 = 2.0. Truncate it to 2. Good!
// 1.99 + 0.5 = 2.49. Truncate it to 2. Good!
// 1.49 + 0.5 = 1.99. Truncate it to 1. Good!
// 2. To round to the nearest tenth place:
// Multiply by 10 (this is equivalent to doing a single base-10 left-shift), then add 1/2, then let
// it be truncated since it is an integer, then divide by 10 (this is a base-10 right-shift).
// Example:
// 1.57 x 10 + 1/2 = 15.7 + 0.5 = 16.2. Truncate to 16. Divide by 10 --> 1.6. Good.
// 3. To round to the nearest hundredth place:
// Multiply by 100 (base-10 left-shift 2 places), add 1/2, truncate, divide by 100 (base-10
// right-shift 2 places).
// Example:
// 1.579 x 100 + 1/2 = 157.9 + 0.5 = 158.4. Truncate to 158. Divide by 100 --> 1.58. Good.
//
// BASE 2 CONCEPT:
// - We are dealing with fractional numbers stored in base-2 binary bits, however, and we have already
// left-shifted by FRACTION_BITS (num << FRACTION_BITS) when we converted our numbers to fixed-point
// numbers. Therefore, *all we have to do* is add the proper value, and we get the same effect when we
// right-shift by FRACTION_BITS (num >> FRACTION_BITS) in our conversion back from fixed-point to regular
// numbers. Here's what that looks like for us:
// - Note: "addend" = "a number that is added to another".
// (see https://www.google.com/search?q=addend&oq=addend&aqs=chrome.0.0l6.1290j0j7&sourceid=chrome&ie=UTF-8).
// - Rounding to 0 digits means simply rounding to the nearest whole number.
// Round to: Addends:
// 0 digits: add 5/10 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/2
// 1 digits: add 5/100 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/20
// 2 digits: add 5/1000 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/200
// 3 digits: add 5/10000 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/2000
// 4 digits: add 5/100000 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/20000
// 5 digits: add 5/1000000 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/200000
// 6 digits: add 5/10000000 * FRACTION_DIVISOR ==> + FRACTION_DIVISOR/2000000
// etc.
printf("WITH MANUAL INTEGER-BASED ROUNDING:\n");
// Calculate addends used for rounding (see definition of "addend" above).
fixed_point_t addend0 = FRACTION_DIVISOR/2;
fixed_point_t addend1 = FRACTION_DIVISOR/20;
fixed_point_t addend2 = FRACTION_DIVISOR/200;
fixed_point_t addend3 = FRACTION_DIVISOR/2000;
fixed_point_t addend4 = FRACTION_DIVISOR/20000;
fixed_point_t addend5 = FRACTION_DIVISOR/200000;
// Print addends used for rounding.
printf("addend0 = %u.\n", addend0);
printf("addend1 = %u.\n", addend1);
printf("addend2 = %u.\n", addend2);
printf("addend3 = %u.\n", addend3);
printf("addend4 = %u.\n", addend4);
printf("addend5 = %u.\n", addend5);
// Calculate rounded prices
fixed_point_t price_rounded0 = price + addend0; // round to 0 decimal digits
fixed_point_t price_rounded1 = price + addend1; // round to 1 decimal digits
fixed_point_t price_rounded2 = price + addend2; // round to 2 decimal digits
fixed_point_t price_rounded3 = price + addend3; // round to 3 decimal digits
fixed_point_t price_rounded4 = price + addend4; // round to 4 decimal digits
fixed_point_t price_rounded5 = price + addend5; // round to 5 decimal digits
// Print manually rounded prices of manually-printed fixed point integers as though they were "floats".
printf("rounded price (manual float, rounded to 0 digits after decimal) is %u.\n",
price_rounded0 >> FRACTION_BITS);
printf("rounded price (manual float, rounded to 1 digit after decimal) is %u.%01lu.\n",
price_rounded1 >> FRACTION_BITS, (uint64_t)(price_rounded1 & FRACTION_MASK) * 10 / FRACTION_DIVISOR);
printf("rounded price (manual float, rounded to 2 digits after decimal) is %u.%02lu.\n",
price_rounded2 >> FRACTION_BITS, (uint64_t)(price_rounded2 & FRACTION_MASK) * 100 / FRACTION_DIVISOR);
printf("rounded price (manual float, rounded to 3 digits after decimal) is %u.%03lu.\n",
price_rounded3 >> FRACTION_BITS, (uint64_t)(price_rounded3 & FRACTION_MASK) * 1000 / FRACTION_DIVISOR);
printf("rounded price (manual float, rounded to 4 digits after decimal) is %u.%04lu.\n",
price_rounded4 >> FRACTION_BITS, (uint64_t)(price_rounded4 & FRACTION_MASK) * 10000 / FRACTION_DIVISOR);
printf("rounded price (manual float, rounded to 5 digits after decimal) is %u.%05lu.\n",
price_rounded5 >> FRACTION_BITS, (uint64_t)(price_rounded5 & FRACTION_MASK) * 100000 / FRACTION_DIVISOR);
// =================================================================================================================
printf("\nRELATED CONCEPT: DOING LARGE-INTEGER MATH WITH SMALL INTEGER TYPES:\n");
// RELATED CONCEPTS:
// Now let's practice handling (doing math on) large integers (ie: large relative to their integer type),
// withOUT resorting to using larger integer types (because they may not exist for our target processor),
// and withOUT using floating point math, since that might also either not exist for our processor, or be too
// slow or program-space-intensive for our application.
// - These concepts are especially useful when you hit the limits of your architecture's integer types: ex:
// if you have a uint64_t nanosecond timestamp that is really large, and you need to multiply it by a fraction
// to convert it, but you don't have uint128_t types available to you to multiply by the numerator before
// dividing by the denominator. What do you do?
// - We can use fixed-point math to achieve desired results. Let's look at various approaches.
// - Let's say my goal is to multiply a number by a fraction < 1 withOUT it ever growing into a larger type.
// - Essentially we want to multiply some really large number (near its range limit for its integer type)
// by some_number/some_larger_number (ie: a fraction < 1). The problem is that if we multiply by the numerator
// first, it will overflow, and if we divide by the denominator first we will lose resolution via bits
// right-shifting out.
// Here are various examples and approaches.
// -----------------------------------------------------
// EXAMPLE 1
// Goal: Use only 16-bit values & math to find 65401 * 16/127.
// Result: Great! All 3 approaches work, with the 3rd being the best. To learn the techniques required for the
// absolute best approach of all, take a look at the 8th approach in Example 2 below.
// -----------------------------------------------------
uint16_t num16 = 65401; // 1111 1111 0111 1001
uint16_t times = 16;
uint16_t divide = 127;
printf("\nEXAMPLE 1\n");
// Find the true answer.
// First, let's cheat to know the right answer by letting it grow into a larger type.
// Multiply *first* (before doing the divide) to avoid losing resolution.
printf("%u * %u/%u = %u. <== true answer\n", num16, times, divide, (uint32_t)num16*times/divide);
// 1st approach: just divide first to prevent overflow, and lose precision right from the start.
uint16_t num16_result = num16/divide * times;
printf("1st approach (divide then multiply):\n");
printf(" num16_result = %u. <== Loses bits that right-shift out during the initial divide.\n", num16_result);
// 2nd approach: split the 16-bit number into 2 8-bit numbers stored in 16-bit numbers,
// placing all 8 bits of each sub-number to the ***far right***, with 8 bits on the left to grow
// into when multiplying. Then, multiply and divide each part separately.
// - The problem, however, is that you'll lose meaningful resolution on the upper-8-bit number when you
// do the division, since there's no bits to the right for the right-shifted bits during division to
// be retained in.
// Re-sum both sub-numbers at the end to get the final result.
// - NOTE THAT 257 IS THE HIGHEST *TIMES* VALUE I CAN USE SINCE 2^16/0b0000,0000,1111,1111 = 65536/255 = 257.00392.
// Therefore, any *times* value larger than this will cause overflow.
uint16_t num16_upper8 = num16 >> 8; // 1111 1111
uint16_t num16_lower8 = num16 & 0xFF; // 0111 1001
num16_upper8 *= times;
num16_lower8 *= times;
num16_upper8 /= divide;
num16_lower8 /= divide;
num16_result = (num16_upper8 << 8) + num16_lower8;
printf("2nd approach (split into 2 8-bit sub-numbers with bits at far right):\n");
printf(" num16_result = %u. <== Loses bits that right-shift out during the divide.\n", num16_result);
// 3rd approach: split the 16-bit number into 2 8-bit numbers stored in 16-bit numbers,
// placing all 8 bits of each sub-number ***in the center***, with 4 bits on the left to grow when
// multiplying and 4 bits on the right to not lose as many bits when dividing.
// This will help stop the loss of resolution when we divide, at the cost of overflowing more easily when we
// multiply.
// - NOTE THAT 16 IS THE HIGHEST *TIMES* VALUE I CAN USE SINCE 2^16/0b0000,1111,1111,0000 = 65536/4080 = 16.0627.
// Therefore, any *times* value larger than this will cause overflow.
num16_upper8 = (num16 >> 4) & 0x0FF0;
num16_lower8 = (num16 << 4) & 0x0FF0;
num16_upper8 *= times;
num16_lower8 *= times;
num16_upper8 /= divide;
num16_lower8 /= divide;
num16_result = (num16_upper8 << 4) + (num16_lower8 >> 4);
printf("3rd approach (split into 2 8-bit sub-numbers with bits centered):\n");
printf(" num16_result = %u. <== Perfect! Retains the bits that right-shift during the divide.\n", num16_result);
// -----------------------------------------------------
// EXAMPLE 2
// Goal: Use only 16-bit values & math to find 65401 * 99/127.
// Result: Many approaches work, so long as enough bits exist to the left to not allow overflow during the
// multiply. The best approach is the 8th one, however, which 1) right-shifts the minimum possible before the
// multiply, in order to retain as much resolution as possible, and 2) does integer rounding during the divide
// in order to be as accurate as possible. This is the best approach to use.
// -----------------------------------------------------
num16 = 65401; // 1111 1111 0111 1001
times = 99;
divide = 127;
printf("\nEXAMPLE 2\n");
// Find the true answer by letting it grow into a larger type.
printf("%u * %u/%u = %u. <== true answer\n", num16, times, divide, (uint32_t)num16*times/divide);
// 1st approach: just divide first to prevent overflow, and lose precision right from the start.
num16_result = num16/divide * times;
printf("1st approach (divide then multiply):\n");
printf(" num16_result = %u. <== Loses bits that right-shift out during the initial divide.\n", num16_result);
// 2nd approach: split the 16-bit number into 2 8-bit numbers stored in 16-bit numbers,
// placing all 8 bits of each sub-number to the ***far right***, with 8 bits on the left to grow
// into when multiplying. Then, multiply and divide each part separately.
// - The problem, however, is that you'll lose meaningful resolution on the upper-8-bit number when you
// do the division, since there's no bits to the right for the right-shifted bits during division to
// be retained in.
// Re-sum both sub-numbers at the end to get the final result.
// - NOTE THAT 257 IS THE HIGHEST *TIMES* VALUE I CAN USE SINCE 2^16/0b0000,0000,1111,1111 = 65536/255 = 257.00392.
// Therefore, any *times* value larger than this will cause overflow.
num16_upper8 = num16 >> 8; // 1111 1111
num16_lower8 = num16 & 0xFF; // 0111 1001
num16_upper8 *= times;
num16_lower8 *= times;
num16_upper8 /= divide;
num16_lower8 /= divide;
num16_result = (num16_upper8 << 8) + num16_lower8;
printf("2nd approach (split into 2 8-bit sub-numbers with bits at far right):\n");
printf(" num16_result = %u. <== Loses bits that right-shift out during the divide.\n", num16_result);
/////////////////////////////////////////////////////////////////////////////////////////////////
// TRUNCATED BECAUSE STACK OVERFLOW WON'T ALLOW THIS MANY CHARACTERS.
// See the rest of the code on github: https://github.com/ElectricRCAircraftGuy/fixed_point_math
/////////////////////////////////////////////////////////////////////////////////////////////////
return 0;
} // main
// PRIVATE FUNCTION DEFINITIONS:
/// @brief A function to help identify at what decimal digit error is introduced, based on how many bits you are using
/// to represent the fractional portion of the number in your fixed-point number system.
/// @details Note: this function relies on an internal static bool to keep track of if it has already
/// identified at what decimal digit error is introduced, so once it prints this fact once, it will never
/// print again. This is by design just to simplify usage in this demo.
/// @param[in] num_digits_after_decimal The number of decimal digits we are printing after the decimal
/// (0, 1, 2, 3, etc)
/// @return None
static void print_if_error_introduced(uint8_t num_digits_after_decimal)
{
static bool already_found = false;
// Array of power base 10 values, where the value = 10^index:
const uint32_t POW_BASE_10[] =
{
1, // index 0 (10^0)
10,
100,
1000,
10000,
100000,
1000000,
10000000,
100000000,
1000000000, // index 9 (10^9); 1 Billion: the max power of 10 that can be stored in a uint32_t
};
if (already_found == true)
{
goto done;
}
if (POW_BASE_10[num_digits_after_decimal] > FRACTION_DIVISOR)
{
already_found = true;
printf(" <== Fixed-point math decimal error first\n"
" starts to get introduced here since the fixed point resolution (1/%u) now has lower resolution\n"
" than the base-10 resolution (which is 1/%u) at this decimal place. Decimal error may not show\n"
" up at this decimal location, per say, but definitely will for all decimal places hereafter.",
FRACTION_DIVISOR, POW_BASE_10[num_digits_after_decimal]);
}
done:
printf("\n");
}
输出:
gabriel$ cp fixed_point_math.cpp fixed_point_math_copy.c && gcc -Wall -std=c99 -o ./bin/fixed_point_math_c > fixed_point_math_copy.c && ./bin/fixed_point_math_c Begin. fraction bits = 16. whole number bits = 16. max whole number = 65535. price as a true double is 219.857142857. price as integer is 219. price fractional part is 56173 (of 65536). price fractional part as decimal is 0.857132 (56173/65536). price (manual float, 0 digits after decimal) is 219. price (manual float, 1 digit after decimal) is 219.8. price (manual float, 2 digits after decimal) is 219.85. price (manual float, 3 digits after decimal) is 219.857. price (manual float, 4 digits after decimal) is 219.8571. price (manual float, 5 digits after decimal) is 219.85713. <== Fixed-point math decimal error first starts to get introduced here since the fixed point resolution (1/65536) now has lower resolution than the base-10 resolution (which is 1/100000) at this decimal place. Decimal error may not show up at this decimal location, per say, but definitely will for all decimal places hereafter. price (manual float, 6 digits after decimal) is 219.857131. WITH MANUAL INTEGER-BASED ROUNDING: addend0 = 32768. addend1 = 3276. addend2 = 327. addend3 = 32. addend4 = 3. addend5 = 0. rounded price (manual float, rounded to 0 digits after decimal) is 220. rounded price (manual float, rounded to 1 digit after decimal) is 219.9. rounded price (manual float, rounded to 2 digits after decimal) is 219.86. rounded price (manual float, rounded to 3 digits after decimal) is 219.857. rounded price (manual float, rounded to 4 digits after decimal) is 219.8571. rounded price (manual float, rounded to 5 digits after decimal) is 219.85713. RELATED CONCEPT: DOING LARGE-INTEGER MATH WITH SMALL INTEGER TYPES: EXAMPLE 1 65401 * 16/127 = 8239. <== true answer 1st approach (divide then multiply): num16_result = 8224. <== Loses bits that right-shift out during the initial divide. 2nd approach (split into 2 8-bit sub-numbers with bits at far right): num16_result = 8207. <== Loses bits that right-shift out during the divide. 3rd approach (split into 2 8-bit sub-numbers with bits centered): num16_result = 8239. <== Perfect! Retains the bits that right-shift during the divide. EXAMPLE 2 65401 * 99/127 = 50981. <== true answer 1st approach (divide then multiply): num16_result = 50886. <== Loses bits that right-shift out during the initial divide. 2nd approach (split into 2 8-bit sub-numbers with bits at far right): num16_result = 50782. <== Loses bits that right-shift out during the divide. 3rd approach (split into 2 8-bit sub-numbers with bits centered): num16_result = 1373. <== Completely wrong due to overflow during the multiply. 4th approach (split into 4 4-bit sub-numbers with bits centered): num16_result = 15870. <== Completely wrong due to overflow during the multiply. 5th approach (split into 8 2-bit sub-numbers with bits centered): num16_result = 50922. <== Loses a few bits that right-shift out during the divide. 6th approach (split into 16 1-bit sub-numbers with bits skewed left): num16_result = 50963. <== Loses the fewest possible bits that right-shift out during the divide. 7th approach (split into 16 1-bit sub-numbers with bits skewed left): num16_result = 50963. <== [same as 6th approach] Loses the fewest possible bits that right-shift out during the divide. [BEST APPROACH OF ALL] 8th approach (split into 16 1-bit sub-numbers with bits skewed left, w/integer rounding during division): num16_result = 50967. <== Loses the fewest possible bits that right-shift out during the divide, & has better accuracy due to rounding during the divide.
参考:
- [我的回购] https://github.com/ElectricRCAircraftGuy/eRCaGuy_analogReadXXbit/blob/master/eRCaGuy_analogReadXXbit.cpp - 请参阅底部的“整数数学舍入注释”。
如果您的唯一目的是节省内存,我不建议您这样做。价格计算中的错误可以累积,你会搞砸它。
如果你真的想实现类似的东西,你可以只取价格的最小区间,然后直接使用 int 和 integer 操作来操作你的数字吗?您只需在显示时将其转换为浮点数,让您的生活更轻松。