c++ - 我应该在我的 C++ 标准随机分布上调用 reset() 来清除隐藏状态吗？

Question

我想用简单的函数包装来自 C++11 标准库的随机数分布，这些函数将分布的参数和生成器实例作为参数。例如：

double normal(double mean, double sd, std::mt19937_64& generator)
{
    static std::normal_distribution<double> dist;
    return dist(generator, std::normal_distribution<double>::param_type(mean, sd));
}

我想避免分发对象中的任何隐藏状态，以便每次调用此包装函数仅取决于给定的参数。（潜在地，对该函数的每次调用都可能采用不同的生成器实例。）理想情况下，我会创建分发实例static const来确保这一点；但是，分布operator()不是 const 函数，所以这是不可能的。

我的问题是：为了确保分发中没有隐藏状态，reset()每次调用分发是否 1）有必要且 2）是否足够？例如：

double normal(double mean, double sd, std::mt19937_64& generator)
{
    static std::normal_distribution<double> dist;
    dist.reset();
    return dist(generator, std::normal_distribution<double>::param_type(mean, sd));
}

（总的来说，我对reset()随机分布函数的目的感到困惑......我理解为什么有时需要重置/重新设置生成器，但为什么需要重置分布对象？）

score 8 · Accepted Answer

为了确保分布中没有隐藏状态，1) 是否有必要

是的。

和 2) 每次都足以在分配上调用 reset() 吗？

是的。

You probably don't want to do this though. At least not on every single call. The std::normal_distribution is the poster-child for allowing distributions to maintain state. For example a popular implementation will use the Box-Muller transformation to compute two random numbers at once, but hand you back only one of them, saving the other for the next time you call. Calling reset() prior to the next call would cause the distribution to throw away this already valid result, and cut the efficiency of the algorithm in half.

score 2 · Accepted Answer

Some distributions have internal state. If you interfere with how the distribution works by constantly resetting it you won't get properly distributed results. This is just like calling srand() before every call to rand().

score 1 · Accepted Answer

Calling reset() on a distribution object d has the following effect:

Subsequent uses of d do not depend on values produced by any engine prior to invoking reset.

(an engine is in short a generator that can be seeded).

In other words, it clears any "cached" random data that the distribution object has stored and that depends on output that it has previously drawn from an engine.

So, if you want to do that then you should call reset(). The main reason I can think of that you would want to do that is when you are seeding your engine with a known value with the intention of producing repeatable pseudo-random results. If you want the results from your distribution object to also be repeatable based on that seed, then you need to reset the distribution object (or create a new one).

Another reason I can think of is that you are defensively reseeding the generator object because you fear that some attacker may gain partial knowledge of its internal state (as for example Fortuna does). To over-simplify, you can imagine that the quality/security of the generator's data diminishes over time, and that reseeding restores it. Since a distribution object can cache arbitrary amounts of data from the generator, there will be an arbitrary delay between increasing the quality/security of the output of the generator, and increasing the quality/security of the output of the distribution object. Calling reset on the distribution object avoids this delay. But I won't swear to this latter use being correct, because it gets into the realms where I prefer not to make my own judgement about what is secure, if I can possibly rely on peer-reviewed work by an expert :-)

With regard to your code in particular -- if you don't want the output to depend on previous use of the same dist object with different generator objects, then calling reset() would be the way to do that. But I think it's unlikely that calling reset on a distribution object and then using it with new parameters will be any cheaper than constructing a new distribution object with those parameters. So using a static local object seems to me to make your function non-thread-safe for no benefit: you could create a new distribution object each time and the code would likely be no worse. There are reasons for the design in the standard, and you're expected to use a distribution object repeatedly with the same generator. The function you've written, cutting the distribution object out of the interface, discards the benefits of that part of the design in the standard.

c++ - 我应该在我的 C++ 标准随机分布上调用 reset() 来清除隐藏状态吗？

3 回答 3

Related

Reference