Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max
is given below.
Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.
If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).
Following on from this, an attempt to implement this in code, in R, looks like this:
fx = function(k,maximum)
{
n=maximum+1;
failure = factorial(n-1)/factorial(n-1-k) / n^k;
success = (k+1) / n;
failure * success
}
This assumes min=0, but generalizing to arbitrary min
s isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1)
as a denominator, so we calculate the denominator, n
. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k)
means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum
, it is 1.
Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.
Incidentally the R code to do this is as follows
plot_probability_mass_function = function(maximum)
{
x=0:maximum;
barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}
par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)
Mathematically, the distribution is, if I've got my maths right, given by:
which simplifies to
(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)
The latter is given by the R function
function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }
Plotting the mean number of iterations is done like this in R
meanf = function(minimum)
{
x = 0:minimum
probs = f(x,minimum)
x %*% probs
}
meanf = function(maximum)
{
x = 0:maximum
probs = f(x,maximum)
x %*% probs
}
par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")