6

我在 SDCC 2.8.0 上,因此内存和代码大小非常有限。假设我有一个介于 0 和 127 之间的输入值,我想将其缩放到 20 - 100。通常我会这样做:

int scale(int input, int min, int max)
{
 // assuming max is always greater than min
 float range = (float)max - (float)min;
 int output = min + int((range / 127.f) * (float)input);
 return output;
}

通过调用scale(64, 20, 100);我得到 60,正好在 20 和 100 之间。

不使用浮点数如何做到这一点?任何位移魔法?

4

2 回答 2

3

If (max-min)<(INT_MAX/127) then you can naivly multiply (max-min)*input before dividing /127
Else, you'll have to decompose operations in order to avoid overflow and undefined behavior...

In later case, a naive possibility would be to divide both multipliers by 127.

A=Q1*127+R1
B=Q2*127+R2
A*B = (Q1*Q2*127 + Q1*R2 + Q2*R1) * 127 + R1*R2
(A*B)/127 = Q1*Q2*127 + Q1*R2 + Q2*R1 + (R1*R2/127)

or in C:

unsigned int range=max-min;
unsigned int output = min
    + (range/127)*(input/127)*127
    + (range/127)*(input%127)
    + (range%127)*(input/127)
    + (range%127)*(input%127) / 127;

It's pretty sure that there are more efficient formulation with bit-shifting >>8, the compiler might already do it well, but maybe not so well and we might better help him:

A=Q1*128+R1
B= 0*128+R2 (because B<=127)
A*B = (Q1*R2) * (127+1) + R1*R2
(A*B)/127 = Q1*R2 + (Q1*R2 + R1*R2)/127

and in C:
EDIT
Ahem, my intention was to divide by 128, that is >>7, and I incorrectly wrote >>8 same for remainder which should be &0x7F not &0xFF
It's certainly better to be less obscure and just write /128 and %128 because we can trust the compiler to translate these ops into simple bit ops nowadays...

unsigned int range=max-min;
unsigned int high=(range / 128)*input;
unsigned int low =(range % 128)*input;
unsigned int output = min + high + (high+low)/127;

EDIT2
For balancing the distribution a little bit better, we might apply some sort of rounding rather than truncation like this:

unsigned int output = min + high + (high+low+63)/127;
于 2014-01-28T19:30:41.810 回答
2

我知道这是一个旧线程,但我只是想分享一些技巧,如果缩放常数是固定的并且事先已知,您可以使用它来更有效地使用浮点数进行缩放。编译器通常在使用整数文字进行除法时使用这些技巧,以避免通常昂贵的div指令(在许多体系结构上可能需要数十个周期)。

显然,除非您真的需要从每个缩放操作中减少这几个周期,否则这就是过早优化的定义。

无论如何,我们的想法是将浮点因子更改为分母为 2 次方的近似值,以便您可以用乘法(通常为 1 个周期)和右移操作(通常为 1 个周期)替换除法对匹配架构字大小的整数的操作)。

1/127在您的情况下,您的目标是用右移替换该部分,即具有二的幂的除法。由于您需要缩放80/127(大约0.62992)并且输入适合 7 位,您可以选择类似161/256(我假设您有一个 16 位控制器,所以我只是乘以0.62992256因为您的输入值都适合低字的字节)。

所以函数就变成了:

// scale 0..127 into 20..100
uint8_t scale(uint8_t input)
{
    uint16_t low = input * 161;   // <- this will move the result into the high 8 bits
    low += 1 << 7;                // <- adding a "half bit" before shifting (like +0.5)
    low >>= 8;                    // <- cheap division by 256
    low += 20;                    // <- and finally, add offset

    return (uint8_t)(low);
}

在 32 位微控制器上,您可以选择更大的因子以获得更好的近似值。cpu/编译器使用本机字大小通常更快,因为它不需要截断或扩展寄存器值来获得更小的整数大小。

由于127需要 7 位,您可以选择 24 位分母,并且仍然确保相乘后的值适合 32 位字,即:

// 0.62992 == 10568325 / ‭16777216‬ == 10568325 / (1<<24)
uint8_t scale_32(uint8_t input)
{
    uint32_t low = input * 10568325;
    low += 1 << 23;
    low >>= 24;
    low += 20;
    return (uint8_t)(low);
}

您可以使用godbolt在线编译器来比较这些函数在不同编译器/架构中的程序集。

于 2019-01-29T00:11:35.900 回答