c++ - 如何获得与实现无关的 std::uniform_int_distribution 版本？

Question

std::uniform_int_distribution接受<random> 的任何 PRNG，包括跨实现和平台一致的 PRNG。

但是，std::uniform_int_distribution它本身似乎在实现之间并不一致，因此我不能依赖能够复制它们，即使使用常见的 PRNG 和种子。这也会影响相关功能，例如std::shuffle().

例如：

#include <random>
#include <iostream>
#include <string>
#include <algorithm>

template<typename T>
void printvector(const std::string& title, const std::vector<T>& v)
{
        std::cout << title << ": { ";
        for (const auto& val : v) { std::cout<<val<<" "; }
        std::cout << "}" << std::endl;
}


int main()
{
        const static size_t SEED = 770;
        std::minstd_rand r1(SEED), r2(SEED), r3(SEED);

        std::vector<int> vPRNG;
        for (int i=0; i<10; ++i) { vPRNG.push_back((int)r1()); }

        std::vector<size_t> vUniform;
        std::uniform_int_distribution<int> D(0,301);
        for (int i=0; i<10; ++i) { vUniform.push_back(D(r2)); }

        std::vector<size_t> vShuffled {1,2,3,4,5,6,7,8,9,10};
        std::shuffle(vShuffled.begin(), vShuffled.end(), r3);

        printvector("PRNG", vPRNG);
        printvector("UniformDist", vUniform);
        printvector("Shuffled", vShuffled);
}

在不同的系统上给我不同的结果，即使 PRNG 本身生成完全相同的数字：

系统一：

PRNG: { 37168670 1020024325 89133659 1161108648 699844555 131263448 1141139758 1001712868 940055376 1083593786 }
UniformDist: { 5 143 12 163 98 18 160 140 132 152 }
Shuffled: { 7 6 5 2 10 3 4 1 8 9 }

系统二：

PRNG: { 37168670 1020024325 89133659 1161108648 699844555 131263448 1141139758 1001712868 940055376 1083593786 }
UniformDist: { 19 298 170 22 53 7 43 67 96 255 }
Shuffled: { 3 7 4 1 5 2 6 9 10 8 }

如何正确实现跨不同平台和标准库实现一致的统一分布？

score 2 · Accepted Answer

这是一个真正均匀分布的示例，使用拒绝采样来克服模问题。b - a + 1如果范围 ( ) “短”，则拒绝采样不是问题，但对于非常大的范围，它可能会出现问题。确保b - a + 1不会下溢/溢出。

template <class IntType = int>
struct my_uniform_int_distribution
{
    using result_type = IntType;

    const result_type A, B;

    struct param_type
    {
        const result_type A, B;

        param_type(result_type aa, result_type bb)
         : A(aa), B(bb)
        {}
    };

    explicit my_uniform_int_distribution(const result_type a = 0, const result_type b = std::numeric_limits<result_type>::max())
     : A(a), B(b)
    {}

    explicit my_uniform_int_distribution(const param_type& params)
     : A(params.A), B(params.B)
    {}

    template <class Generator>
    result_type operator()(Generator& g) const
    {
        return rnd(g, A, B);
    }

    template <class Generator>
    result_type operator()(Generator& g, const param_type& params) const
    {
        return rnd(g, params.A, params.B);
    }

    result_type a() const
    {
        return A;
    }

    result_type b() const
    {
        return B;
    }

    result_type min() const
    {
        return A;
    }

    result_type max() const
    {
        return B;
    }

private:
    template <class Generator>
    result_type rnd(Generator& g, const result_type a, const result_type b) const
    {
        static_assert(std::is_convertible<typename Generator::result_type, result_type>::value, "Ups...");
        static_assert(Generator::min() == 0, "If non-zero we have handle the offset");
        const result_type range = b - a + 1;
        assert(Generator::max() >= range); // Just for safety
        const result_type reject_lim = g.max() % range;
        result_type n;
        do
        {
            n = g();
        }
        while (n <= reject_lim);
        return (n % range) + a;
    }
};

template<class RandomIt, class UniformRandomBitGenerator>
void my_shuffle(RandomIt first, RandomIt last, UniformRandomBitGenerator&& g)
{
    typedef typename std::iterator_traits<RandomIt>::difference_type diff_t;
    typedef my_uniform_int_distribution<diff_t> distr_t;
    typedef typename distr_t::param_type param_t;

    distr_t D;
    diff_t n = last - first;
    for (diff_t i = n-1; i > 0; --i)
    {
        std::swap(first[i], first[D(g, param_t(0, i))]);
    }
}

score 1 · Accepted Answer

乔纳斯回答的样板实际上非常有用。我对严厉的批评感到抱歉。无论如何，避免均匀分布中的偏差非常重要。实现这一点的最简单方法是在随机生成器提供的值超出允许无偏映射的最大范围时“重新滚动”。这假设生成器的结果类型至少与分布的结果类型具有相同的位长（否则可能需要一次使用多个生成器结果值）。另一个重要的考虑是避免整数溢出时b - a + 1会溢出result_type。所以有三个主要的警告：

请注意URNG' 的result_type位数是否少于分布的位数
谨防偏见
小心整数溢出

鉴于这些挑战，Boost 的实现有超过 150 个 LOC（包括评论）也就不足为奇了。如果可能的话，我建议坚持使用其中一种可用的实现，因为这很容易搞砸。Boost 的问题是算法可能会在版本之间通知或不通知而更改。您可以通过复制 Boost 代码来解决这个问题，这样您就不必依赖给定的版本。这可能意味着您的程序可能具有跨平台的“bug for bug”兼容性——如果您不走运的话。（当然，任何没有可证明正确的实现都可能出现这个问题。）

显然，在将任何库代码复制到您的项目之前，还要检查许可条款。例如，我认为如果您复制 libstdc++ 的实现，这可能意味着您必须在 GPL 和 copyleft 下分发您的程序。

c++ - 如何获得与实现无关的 std::uniform_int_distribution 版本？

2 回答 2

Related

Reference