c++ - 做定点数学的最佳方法是什么？

Question

我需要为没有 FPU 的 Nintendo DS 加速一个程序，所以我需要将浮点数学（模拟且缓慢）更改为定点。

我是如何开始的，我将浮点数更改为整数，每当需要转换它们时，我使用x>>8将定点变量 x 转换为实际数字，并使用x<<8转换为定点。很快我发现不可能跟踪需要转换的内容，并且我还意识到很难更改数字的精度（在这种情况下为 8。）

我的问题是，我应该如何使这更容易并且仍然快速？我应该制作一个 FixedPoint 类，还是只制作一个 FixedPoint8 typedef 或带有一些函数/宏的结构来转换它们，或者其他什么？我应该在变量名中添加一些东西来显示它的定点吗？

score 54 · Accepted Answer

你可以试试我的定点类（最新可用@ https://github.com/eteran/cpp-utilities）

// From: https://github.com/eteran/cpp-utilities/edit/master/Fixed.h
// See also: http://stackoverflow.com/questions/79677/whats-the-best-way-to-do-fixed-point-math
/*
 * The MIT License (MIT)
 * 
 * Copyright (c) 2015 Evan Teran
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

#ifndef FIXED_H_
#define FIXED_H_

#include <ostream>
#include <exception>
#include <cstddef> // for size_t
#include <cstdint>
#include <type_traits>

#include <boost/operators.hpp>

namespace numeric {

template <size_t I, size_t F>
class Fixed;

namespace detail {

// helper templates to make magic with types :)
// these allow us to determine resonable types from
// a desired size, they also let us infer the next largest type
// from a type which is nice for the division op
template <size_t T>
struct type_from_size {
    static const bool is_specialized = false;
    typedef void      value_type;
};

#if defined(__GNUC__) && defined(__x86_64__)
template <>
struct type_from_size<128> {
    static const bool           is_specialized = true;
    static const size_t         size = 128;
    typedef __int128            value_type;
    typedef unsigned __int128   unsigned_type;
    typedef __int128            signed_type;
    typedef type_from_size<256> next_size;
};
#endif

template <>
struct type_from_size<64> {
    static const bool           is_specialized = true;
    static const size_t         size = 64;
    typedef int64_t             value_type;
    typedef uint64_t            unsigned_type;
    typedef int64_t             signed_type;
    typedef type_from_size<128> next_size;
};

template <>
struct type_from_size<32> {
    static const bool          is_specialized = true;
    static const size_t        size = 32;
    typedef int32_t            value_type;
    typedef uint32_t           unsigned_type;
    typedef int32_t            signed_type;
    typedef type_from_size<64> next_size;
};

template <>
struct type_from_size<16> {
    static const bool          is_specialized = true;
    static const size_t        size = 16;
    typedef int16_t            value_type;
    typedef uint16_t           unsigned_type;
    typedef int16_t            signed_type;
    typedef type_from_size<32> next_size;
};

template <>
struct type_from_size<8> {
    static const bool          is_specialized = true;
    static const size_t        size = 8;
    typedef int8_t             value_type;
    typedef uint8_t            unsigned_type;
    typedef int8_t             signed_type;
    typedef type_from_size<16> next_size;
};

// this is to assist in adding support for non-native base
// types (for adding big-int support), this should be fine
// unless your bit-int class doesn't nicely support casting
template <class B, class N>
B next_to_base(const N& rhs) {
    return static_cast<B>(rhs);
}

struct divide_by_zero : std::exception {
};

template <size_t I, size_t F>
Fixed<I,F> divide(const Fixed<I,F> &numerator, const Fixed<I,F> &denominator, Fixed<I,F> &remainder, typename std::enable_if<type_from_size<I+F>::next_size::is_specialized>::type* = 0) {

    typedef typename Fixed<I,F>::next_type next_type;
    typedef typename Fixed<I,F>::base_type base_type;
    static const size_t fractional_bits = Fixed<I,F>::fractional_bits;

    next_type t(numerator.to_raw());
    t <<= fractional_bits;

    Fixed<I,F> quotient;

    quotient  = Fixed<I,F>::from_base(next_to_base<base_type>(t / denominator.to_raw()));
    remainder = Fixed<I,F>::from_base(next_to_base<base_type>(t % denominator.to_raw()));

    return quotient;
}

template <size_t I, size_t F>
Fixed<I,F> divide(Fixed<I,F> numerator, Fixed<I,F> denominator, Fixed<I,F> &remainder, typename std::enable_if<!type_from_size<I+F>::next_size::is_specialized>::type* = 0) {

    // NOTE(eteran): division is broken for large types :-(
    // especially when dealing with negative quantities

    typedef typename Fixed<I,F>::base_type     base_type;
    typedef typename Fixed<I,F>::unsigned_type unsigned_type;

    static const int bits = Fixed<I,F>::total_bits;

    if(denominator == 0) {
        throw divide_by_zero();
    } else {

        int sign = 0;

        Fixed<I,F> quotient;

        if(numerator < 0) {
            sign ^= 1;
            numerator = -numerator;
        }

        if(denominator < 0) {
            sign ^= 1;
            denominator = -denominator;
        }

            base_type n      = numerator.to_raw();
            base_type d      = denominator.to_raw();
            base_type x      = 1;
            base_type answer = 0;

            // egyptian division algorithm
            while((n >= d) && (((d >> (bits - 1)) & 1) == 0)) {
                x <<= 1;
                d <<= 1;
            }

            while(x != 0) {
                if(n >= d) {
                    n      -= d;
                    answer += x;
                }

                x >>= 1;
                d >>= 1;
            }

            unsigned_type l1 = n;
            unsigned_type l2 = denominator.to_raw();

            // calculate the lower bits (needs to be unsigned)
            // unfortunately for many fractions this overflows the type still :-/
            const unsigned_type lo = (static_cast<unsigned_type>(n) << F) / denominator.to_raw();

            quotient  = Fixed<I,F>::from_base((answer << F) | lo);
            remainder = n;

        if(sign) {
            quotient = -quotient;
        }

        return quotient;
    }
}

// this is the usual implementation of multiplication
template <size_t I, size_t F>
void multiply(const Fixed<I,F> &lhs, const Fixed<I,F> &rhs, Fixed<I,F> &result, typename std::enable_if<type_from_size<I+F>::next_size::is_specialized>::type* = 0) {

    typedef typename Fixed<I,F>::next_type next_type;
    typedef typename Fixed<I,F>::base_type base_type;

    static const size_t fractional_bits = Fixed<I,F>::fractional_bits;

    next_type t(static_cast<next_type>(lhs.to_raw()) * static_cast<next_type>(rhs.to_raw()));
    t >>= fractional_bits;
    result = Fixed<I,F>::from_base(next_to_base<base_type>(t));
}

// this is the fall back version we use when we don't have a next size
// it is slightly slower, but is more robust since it doesn't
// require and upgraded type
template <size_t I, size_t F>
void multiply(const Fixed<I,F> &lhs, const Fixed<I,F> &rhs, Fixed<I,F> &result, typename std::enable_if<!type_from_size<I+F>::next_size::is_specialized>::type* = 0) {

    typedef typename Fixed<I,F>::base_type base_type;

    static const size_t fractional_bits = Fixed<I,F>::fractional_bits;
    static const size_t integer_mask    = Fixed<I,F>::integer_mask;
    static const size_t fractional_mask = Fixed<I,F>::fractional_mask;

    // more costly but doesn't need a larger type
    const base_type a_hi = (lhs.to_raw() & integer_mask) >> fractional_bits;
    const base_type b_hi = (rhs.to_raw() & integer_mask) >> fractional_bits;
    const base_type a_lo = (lhs.to_raw() & fractional_mask);
    const base_type b_lo = (rhs.to_raw() & fractional_mask);

    const base_type x1 = a_hi * b_hi;
    const base_type x2 = a_hi * b_lo;
    const base_type x3 = a_lo * b_hi;
    const base_type x4 = a_lo * b_lo;

    result = Fixed<I,F>::from_base((x1 << fractional_bits) + (x3 + x2) + (x4 >> fractional_bits));

}
}

/*
 * inheriting from boost::operators enables us to be a drop in replacement for base types
 * without having to specify all the different versions of operators manually
 */
template <size_t I, size_t F>
class Fixed : boost::operators<Fixed<I,F>> {
    static_assert(detail::type_from_size<I + F>::is_specialized, "invalid combination of sizes");

public:
    static const size_t fractional_bits = F;
    static const size_t integer_bits    = I;
    static const size_t total_bits      = I + F;

    typedef detail::type_from_size<total_bits>             base_type_info;

    typedef typename base_type_info::value_type            base_type;
    typedef typename base_type_info::next_size::value_type next_type;
    typedef typename base_type_info::unsigned_type         unsigned_type;

public:
    static const size_t base_size          = base_type_info::size;
    static const base_type fractional_mask = ~((~base_type(0)) << fractional_bits);
    static const base_type integer_mask    = ~fractional_mask;

public:
    static const base_type one = base_type(1) << fractional_bits;

public: // constructors
    Fixed() : data_(0) {
    }

    Fixed(long n) : data_(base_type(n) << fractional_bits) {
        // TODO(eteran): assert in range!
    }

    Fixed(unsigned long n) : data_(base_type(n) << fractional_bits) {
        // TODO(eteran): assert in range!
    }

    Fixed(int n) : data_(base_type(n) << fractional_bits) {
        // TODO(eteran): assert in range!
    }

    Fixed(unsigned int n) : data_(base_type(n) << fractional_bits) {
        // TODO(eteran): assert in range!
    }

    Fixed(float n) : data_(static_cast<base_type>(n * one)) {
        // TODO(eteran): assert in range!
    }

    Fixed(double n) : data_(static_cast<base_type>(n * one))  {
        // TODO(eteran): assert in range!
    }

    Fixed(const Fixed &o) : data_(o.data_) {
    }

    Fixed& operator=(const Fixed &o) {
        data_ = o.data_;
        return *this;
    }

private:
    // this makes it simpler to create a fixed point object from
    // a native type without scaling
    // use "Fixed::from_base" in order to perform this.
    struct NoScale {};

    Fixed(base_type n, const NoScale &) : data_(n) {
    }

public:
    static Fixed from_base(base_type n) {
        return Fixed(n, NoScale());
    }

public: // comparison operators
    bool operator==(const Fixed &o) const {
        return data_ == o.data_;
    }

    bool operator<(const Fixed &o) const {
        return data_ < o.data_;
    }

public: // unary operators
    bool operator!() const {
        return !data_;
    }

    Fixed operator~() const {
        Fixed t(*this);
        t.data_ = ~t.data_;
        return t;
    }

    Fixed operator-() const {
        Fixed t(*this);
        t.data_ = -t.data_;
        return t;
    }

    Fixed operator+() const {
        return *this;
    }

    Fixed& operator++() {
        data_ += one;
        return *this;
    }

    Fixed& operator--() {
        data_ -= one;
        return *this;
    }

public: // basic math operators
    Fixed& operator+=(const Fixed &n) {
        data_ += n.data_;
        return *this;
    }

    Fixed& operator-=(const Fixed &n) {
        data_ -= n.data_;
        return *this;
    }

    Fixed& operator&=(const Fixed &n) {
        data_ &= n.data_;
        return *this;
    }

    Fixed& operator|=(const Fixed &n) {
        data_ |= n.data_;
        return *this;
    }

    Fixed& operator^=(const Fixed &n) {
        data_ ^= n.data_;
        return *this;
    }

    Fixed& operator*=(const Fixed &n) {
        detail::multiply(*this, n, *this);
        return *this;
    }

    Fixed& operator/=(const Fixed &n) {
        Fixed temp;
        *this = detail::divide(*this, n, temp);
        return *this;
    }

    Fixed& operator>>=(const Fixed &n) {
        data_ >>= n.to_int();
        return *this;
    }

    Fixed& operator<<=(const Fixed &n) {
        data_ <<= n.to_int();
        return *this;
    }

public: // conversion to basic types
    int to_int() const {
        return (data_ & integer_mask) >> fractional_bits;
    }

    unsigned int to_uint() const {
        return (data_ & integer_mask) >> fractional_bits;
    }

    float to_float() const {
        return static_cast<float>(data_) / Fixed::one;
    }

    double to_double() const        {
        return static_cast<double>(data_) / Fixed::one;
    }

    base_type to_raw() const {
        return data_;
    }

public:
    void swap(Fixed &rhs) {
        using std::swap;
        swap(data_, rhs.data_);
    }

public:
    base_type data_;
};

// if we have the same fractional portion, but differing integer portions, we trivially upgrade the smaller type
template <size_t I1, size_t I2, size_t F>
typename std::conditional<I1 >= I2, Fixed<I1,F>, Fixed<I2,F>>::type operator+(const Fixed<I1,F> &lhs, const Fixed<I2,F> &rhs) {

    typedef typename std::conditional<
        I1 >= I2,
        Fixed<I1,F>,
        Fixed<I2,F>
    >::type T;

    const T l = T::from_base(lhs.to_raw());
    const T r = T::from_base(rhs.to_raw());
    return l + r;
}

template <size_t I1, size_t I2, size_t F>
typename std::conditional<I1 >= I2, Fixed<I1,F>, Fixed<I2,F>>::type operator-(const Fixed<I1,F> &lhs, const Fixed<I2,F> &rhs) {

    typedef typename std::conditional<
        I1 >= I2,
        Fixed<I1,F>,
        Fixed<I2,F>
    >::type T;

    const T l = T::from_base(lhs.to_raw());
    const T r = T::from_base(rhs.to_raw());
    return l - r;
}

template <size_t I1, size_t I2, size_t F>
typename std::conditional<I1 >= I2, Fixed<I1,F>, Fixed<I2,F>>::type operator*(const Fixed<I1,F> &lhs, const Fixed<I2,F> &rhs) {

    typedef typename std::conditional<
        I1 >= I2,
        Fixed<I1,F>,
        Fixed<I2,F>
    >::type T;

    const T l = T::from_base(lhs.to_raw());
    const T r = T::from_base(rhs.to_raw());
    return l * r;
}

template <size_t I1, size_t I2, size_t F>
typename std::conditional<I1 >= I2, Fixed<I1,F>, Fixed<I2,F>>::type operator/(const Fixed<I1,F> &lhs, const Fixed<I2,F> &rhs) {

    typedef typename std::conditional<
        I1 >= I2,
        Fixed<I1,F>,
        Fixed<I2,F>
    >::type T;

    const T l = T::from_base(lhs.to_raw());
    const T r = T::from_base(rhs.to_raw());
    return l / r;
}

template <size_t I, size_t F>
std::ostream &operator<<(std::ostream &os, const Fixed<I,F> &f) {
    os << f.to_double();
    return os;
}

template <size_t I, size_t F>
const size_t Fixed<I,F>::fractional_bits;

template <size_t I, size_t F>
const size_t Fixed<I,F>::integer_bits;

template <size_t I, size_t F>
const size_t Fixed<I,F>::total_bits;

}

#endif

它被设计成几乎可以替代浮点数/双精度数，并且具有可选择的精度。它确实使用 boost 来添加所有必要的数学运算符重载，所以你也需要它（我相信这只是一个头文件依赖，而不是库依赖）。

顺便说一句，常见的用法可能是这样的：

using namespace numeric;
typedef Fixed<16, 16> fixed;
fixed f;

唯一真正的规则是数字必须加起来等于系统的本机大小，例如 8、16、32、64。

score 34 · Accepted Answer

在现代 C++ 实现中，使用简单和精简的抽象（例如具体类）不会降低性能。定点计算正是使用适当设计的类可以避免大量错误的地方。

因此，您应该编写一个 FixedPoint8 类。彻底测试和调试它。如果您必须说服自己与使用普通整数相比它的性能，请测量它。

通过将定点计算的复杂性转移到一个地方，它将使您免于许多麻烦。

FixedPoint8如果您愿意，您可以通过将其设置为模板并将旧的替换为，例如，进一步增加您的类的实用性，typedef FixedPoint<short, 8> FixedPoint8;但是在您的目标架构上，这可能不是必需的，因此首先要避免模板的复杂性。

互联网上的某个地方可能有一个很好的定点类 - 我会开始从Boost库中查找。

score 10 · Accepted Answer

您的浮点代码是否真的使用小数点？如果是这样：

首先，您必须阅读 Randy Yates 关于定点数学简介的论文： http ://www.digitalsignallabs.com/fp.pdf

然后，您需要对浮点代码进行“分析”，以找出代码中“关键”点所需的定点值的适当范围，例如 U(5,3) = 左侧 5 位，3 位右边，未署名。

此时，可以套用上述论文中的算术规则；规则指定如何解释算术运算产生的位。您可以编写宏或函数来执行这些操作。

保留浮点版本很方便，以便比较浮点与定点的结果。

score 8 · Accepted Answer

如果没有特殊的硬件来处理它，我根本不会在 CPU 上使用浮点。我的建议是将所有数字视为缩放到特定因子的整数。例如，所有货币值都以整数表示的美分，而不是浮点数表示的美元。例如，0.72 表示为整数 72。

加法和减法是一个非常简单的整数运算，例如（0.72 + 1 变为 72 + 100 变为 172 变为 1.72）。

乘法稍微复杂一些，因为它需要一个整数乘法，然后是一个比例缩减，例如（0.72 * 2 变成 72 * 200 变成 14400 变成 144（缩减）变成 1.44）。

这可能需要特殊函数来执行更复杂的数学运算（正弦、余弦等），但即使是那些也可以通过使用查找表来加速。示例：由于您使用的是固定 2 表示，因此 (0.0,1] (0-99) 范围内只有 100 个值，并且 sin/cos 在此范围之外重复，因此您只需要一个 100 整数查找表。

干杯，帕克斯。

score 6 · Accepted Answer

更改定点表示通常称为“缩放”。

如果你可以用一个没有性能损失的类来做到这一点，那么这就是要走的路。它在很大程度上取决于编译器及其内联方式。如果使用类会降低性能，那么您需要更传统的 C 风格方法。OOP 方法将为您提供传统实现仅近似的编译器强制类型安全。

@cibyr 有一个很好的 OOP 实现。现在换一种更传统的。

要跟踪缩放了哪些变量，您需要使用一致的约定。在每个变量名的末尾做一个符号表示该值是否被缩放，并编写宏 SCALE() 和 UNSCALE() 扩展为 x>>8 和 x<<8。

#define SCALE(x) (x>>8)
#define UNSCALE(x) (x<<8)

xPositionUnscaled = UNSCALE(10);
xPositionScaled = SCALE(xPositionUnscaled);

使用这么多符号似乎是一项额外的工作，但请注意如何在不查看其他行的情况下一眼看出任何行是正确的。例如：

xPositionScaled = SCALE(xPositionScaled);

显然是错误的，通过检查。

这是Joel 在这篇文章中提到的Apps Hungarian理念的一种变体。

score 6 · Accepted Answer

当我第一次遇到定点数时，我发现 Joe Lemieux 的文章Fixed-point Math in C非常有帮助，它确实提出了一种表示定点值的方法。

不过，我并没有最终使用他的联合表示来表示定点数。我主要有 C 中定点的经验，所以我也没有选择使用类。不过，在大多数情况下，我认为在宏中定义小数位数并使用描述性变量名称会使这很容易使用。另外，我发现最好有用于乘法，尤其是除法的宏或函数，否则你很快就会得到不可读的代码。

例如，使用 24.8 值：

 #include "stdio.h"

/* Declarations for fixed point stuff */

typedef int int_fixed;

#define FRACT_BITS 8
#define FIXED_POINT_ONE (1 << FRACT_BITS)
#define MAKE_INT_FIXED(x) ((x) << FRACT_BITS)
#define MAKE_FLOAT_FIXED(x) ((int_fixed)((x) * FIXED_POINT_ONE))
#define MAKE_FIXED_INT(x) ((x) >> FRACT_BITS)
#define MAKE_FIXED_FLOAT(x) (((float)(x)) / FIXED_POINT_ONE)

#define FIXED_MULT(x, y) ((x)*(y) >> FRACT_BITS)
#define FIXED_DIV(x, y) (((x)<<FRACT_BITS) / (y))

/* tests */
int main()
{
    int_fixed fixed_x = MAKE_FLOAT_FIXED( 4.5f );
    int_fixed fixed_y = MAKE_INT_FIXED( 2 );

    int_fixed fixed_result = FIXED_MULT( fixed_x, fixed_y );
    printf( "%.1f\n", MAKE_FIXED_FLOAT( fixed_result ) );

    fixed_result = FIXED_DIV( fixed_result, fixed_y );
    printf( "%.1f\n", MAKE_FIXED_FLOAT( fixed_result ) );

    return 0;
}

哪个写出来

9.0
4.5

请注意，这些宏存在各种整数溢出问题，我只是想让宏保持简单。这只是我如何在 C 中完成此操作的一个快速而肮脏的示例。在 C++ 中，您可以使用运算符重载使事情变得更清晰。实际上，您也可以轻松地使 C 代码更漂亮......

我想这是一种冗长的说法：我认为使用 typedef 和宏方法是可以的。只要您清楚哪些变量包含定点值，维护起来并不难，但它可能不会像 C++ 类那样漂亮。

如果我处于你的位置，我会尝试获取一些分析数据来显示瓶颈在哪里。如果它们相对较少，则使用 typedef 和宏。如果您决定需要用定点数学全局替换所有浮点数，那么您可能会更好地使用一个类。

score 4 · Accepted Answer

4

Tricks of the Game Programming Gurus的原始版本有一整章是关于实现定点数学的。

于 2008-09-17T03:40:54.850 回答

score 2 · Accepted Answer

template <int precision = 8> class FixedPoint {
private:
    int val_;
public:
    inline FixedPoint(int val) : val_ (val << precision) {};
    inline operator int() { return val_ >> precision; }
    // Other operators...
};

score 0 · Accepted Answer

无论您决定采用哪种方式（我倾向于使用 typedef 和一些 CPP 宏进行转换），您都需要小心谨慎地来回转换。

您可能会发现您永远不需要来回转换。想象一下整个系统中的一切都是 x256。

c++ - 做定点数学的最佳方法是什么？

9 回答 9

Related

Reference