31

我正在为 C++ 命令行 Linux 应用程序编写一些测试。我想生成一堆具有幂律/长尾分布的整数。意思是,我经常得到一些数字,但其中大多数相对不频繁。

理想情况下,我可以将一些魔术方程式与 rand() 或 stdlib 随机函数之一一起使用。如果没有,一个易于使用的 C/C++ 块会很棒。

谢谢!

4

4 回答 4

39

This page at Wolfram MathWorld discusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).

The short answer (derivation at the above link):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

where y is a uniform variate, n is the distribution power, x0 and x1 define the range of the distribution, and x is your power-law distributed variate.

于 2009-05-28T01:40:46.650 回答
21

如果您知道所需的分布(称为概率分布函数 (PDF))并对其进行了适当的归一化,则可以将其积分以获得累积分布函数 (CDF),然后反转 CDF(如果可能)以获得您的转换需要从均匀[0,1]分布到您想要的。

所以你首先定义你想要的分布。

P = F(x)

(对于 [0,1] 中的 x)然后积分给出

C(y) = \int_0^y F(x) dx

如果这可以反转,你会得到

y = F^{-1}(C)

因此,像最后一行一样调用rand()并插入结果并使用 y。C

这个结果被称为抽样基本定理。由于规范化要求和分析反转函数的需要,这很麻烦。

或者,您可以使用拒绝技术:在所需范围内统一抛出一个数字,然后抛出另一个数字并在您第一次抛出指定的位置与 PDF 进行比较。如果第二次抛出超过 PDF,则拒绝。对于具有很多低概率区域的 PDF 来说往往效率低下,比如那些长尾的…

An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.


The real stinker here is that simple x^-n distributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...

于 2009-05-28T01:24:31.567 回答
5

I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.

One tiny difference between the Wolfram MathWorld formula in the accepted answer and other, perhaps more common, equations is the fact that the power law exponent n (which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.

x0 and x1 stand for the lower and upper limits of the distribution.

So here it is:

set.seed(0)
x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e7)   # Number of samples
x  = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)

enter image description here

or plotted in logarithmic scale:

plot(density(x), log="xy", ylab="log density x", col=2)

enter image description here

Here is the summary of the data:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388 
于 2017-10-27T22:52:29.403 回答
3

I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::rand and std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.

The included distribution presets are:

  • uniform_int
  • bernoulli_distribution
  • geometric_distribution
  • poisson_distribution
  • binomial_distribution
  • uniform_real
  • exponential_distribution
  • normal_distribution
  • gamma_distribution

When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensions by Pete Becker has a great chapter on <random>.

Here is an article about how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)

于 2009-05-28T02:51:24.900 回答