c++ - 产生幂律分布的随机数生成器？

Question

我正在为 C++ 命令行 Linux 应用程序编写一些测试。我想生成一堆具有幂律/长尾分布的整数。意思是，我经常得到一些数字，但其中大多数相对不频繁。

理想情况下，我可以将一些魔术方程式与 rand() 或 stdlib 随机函数之一一起使用。如果没有，一个易于使用的 C/C++ 块会很棒。

谢谢！

score 39 · Accepted Answer

This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).

The short answer (derivation at the above link):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.

score 21 · Accepted Answer

如果您知道所需的分布（称为概率分布函数 (PDF)）并对其进行了适当的归一化，则可以将其积分以获得累积分布函数 (CDF)，然后反转 CDF（如果可能）以获得您的转换需要从均匀[0,1]分布到您想要的。

所以你首先定义你想要的分布。

P = F(x)

（对于 [0,1] 中的 x）然后积分给出

C(y) = \int_0^y F(x) dx

如果这可以反转，你会得到

y = F^{-1}(C)

因此，像最后一行一样调用rand()并插入结果并使用 y。C

这个结果被称为抽样基本定理。由于规范化要求和分析反转函数的需要，这很麻烦。

或者，您可以使用拒绝技术：在所需范围内统一抛出一个数字，然后抛出另一个数字并在您第一次抛出指定的位置与 PDF 进行比较。如果第二次抛出超过 PDF，则拒绝。对于具有很多低概率区域的 PDF 来说往往效率低下，比如那些长尾的…

An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.

The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...

score 5 · Accepted Answer

I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.

One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.

x0 and x1 stand for the lower and upper limits of the distribution.

So here it is:

set.seed(0)
x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e7)   # Number of samples
x  = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)

or plotted in logarithmic scale:

plot(density(x), log="xy", ylab="log density x", col=2)

Here is the summary of the data:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388

score 3 · Accepted Answer

I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.

The included distribution presets are:

uniform_int
bernoulli_distribution
geometric_distribution
poisson_distribution
binomial_distribution
uniform_real
exponential_distribution
normal_distribution
gamma_distribution

When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.

Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)

c++ - 产生幂律分布的随机数生成器？

4 回答 4

Related

Reference