math - 平均而言，这个不正确的循环会迭代多少次？

Question

在某些情况下，循环需要运行随机次数的迭代，范围从min到max，包括到。一种可行的解决方案是执行以下操作：

int numIterations = randomInteger(min, max);
for (int i = 0; i < numIterations; i++) {
   /* ... fun and exciting things! ... */
}

许多初级程序员犯的一个常见错误是这样做：

for (int i = 0; i < randomInteger(min, max); i++) {
   /* ... fun and exciting things! ... */
}

这将重新计算每次迭代的循环上限。

我怀疑这并没有给出循环迭代次数的均匀分布，范围从min到max，但我不确定当你做这样的事情时你会得到什么分布。有谁知道循环迭代次数的分布是什么？

作为一个具体的例子：假设min= 0 和max= 2。那么有以下几种可能：

当时i = 0，随机值为 0。循环运行 0 次。
当时i = 0，随机值非零。然后：
- 当时i = 1，随机值为 0 或 1。然后循环运行 1 次。
- 当时i = 1，随机值为 2。然后循环运行 2 次。

第一个事件的概率是 1/3。第二个事件的概率为 2/3，在其中，第一个子案例的概率为 2/3，第二个事件的概率为 1/3。因此，平均分布数为

0 × ¹ / ₃ + 1 × ² / ₃ × ² / ₃ + 2 × ² / ₃ × ¹ / ₃

= 0 + ⁴ / ₉ + ⁴ / ₉

= ⁸ / ₉

请注意，如果分布确实是均匀的，我们希望得到 1 次循环迭代，但现在我们平均只能得到⁸ / ₉。我的问题是是否可以推广这个结果以获得更精确的迭代次数值。

谢谢！

score 5 · Accepted Answer

Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max is given below.

enter image description here

Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.

If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).

Following on from this, an attempt to implement this in code, in R, looks like this:

fx = function(k,maximum)
{
    n=maximum+1;
    failure = factorial(n-1)/factorial(n-1-k) / n^k;
    success = (k+1) / n;
    failure * success
}

This assumes min=0, but generalizing to arbitrary mins isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1) as a denominator, so we calculate the denominator, n. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k) means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum, it is 1.

Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.

enter image description here

Incidentally the R code to do this is as follows

plot_probability_mass_function = function(maximum)
{
    x=0:maximum;
    barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}

par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)

Mathematically, the distribution is, if I've got my maths right, given by:

enter image description here

which simplifies to

enter image description here

(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)

The latter is given by the R function

function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }

Plotting the mean number of iterations is done like this in R

meanf = function(minimum)
{
    x = 0:minimum
    probs = f(x,minimum)
    x %*% probs
}

meanf = function(maximum)
{
    x = 0:maximum
    probs = f(x,maximum)
    x %*% probs
}

par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")

score 2 · Accepted Answer

这是我用matplotlib绘制的一些具体结果。X 轴是i达到的值。Y 轴是达到该值的次数。

分布显然不均匀。我不知道它是什么发行版；我的统计知识很生疏。

1. 最小值 = 10，最大值 = 20，迭代次数 = 100,000

2. 最小值 = 100，最大值 = 200，迭代次数 = 100,000

score 0 · Accepted Answer

我不知道它背后的数学，但我知道如何计算它！在哈斯克尔：

import Numeric.Probability.Distribution

iterations min max = iteration 0
  where
  iteration i = do
    x <- uniform [min..max]
    if i < x
      then iteration (i + 1)
      else return i

现在expected (iterations 0 2)为您提供 ~0.89 的预期值。也许具有必要数学知识的人可以解释我在这里实际在做什么。因为您从 0 开始，所以循环将始终至少运行min几次。

score 0 · Accepted Answer

我相信，如果有足够的执行次数，它仍然会符合randomInteger函数的分布。

但这可能是一个更适合在MATHEMATICS上提出的问题。

math - 平均而言，这个不正确的循环会迭代多少次？

4 回答 4

1. 最小值 = 10，最大值 = 20，迭代次数 = 100,000

2. 最小值 = 100，最大值 = 200，迭代次数 = 100,000

Related

Reference