I'm trying to calculate the probability of a bivariate normal distribution over a specific area respectively a specific polygon in java.
The mathematical description would be to integrate the probability density function (pdf) of the bivariate normal distribution over a specific complex area.
My first approach was to use two NormalDistribution
objects with the aid of the apache-commons-math
library. Given dataset x for dimension 1 and dataset y for dimension 2 I've computed mean and standard deviation for each NormalDistribution
.
With the method
public double probability(double x0, double x1)
from org.apache.commons.math3.distribution.NormalDistribution
I'm able to set an individual interval for each dimension, which means I can define a rectangular area and get the probability by
NormalDistribution normalX = new NormalDistribution(means[0], stdDeviation_x);
NormalDistribution normalY = new NormalDistribution(means[1], stdDeviation_y);
double probabilityOfRect = normalX.probability(x1, x2) * normalY.probability(y1, y2);
If the standard deviations are small enough and the defined region is large enough, the probability will approach to a number of 1.0 (0.99999999999), which is expected.
As I've said I need to compute a specific area, my first approach won't work this way because I'm only able to define rectangular areas.
So my second approach was to use the class MultivariateNormalDistribution
, which is also implemented in apache-commons-math
.
By using the MultivariateNormalDistribution
with the vector means and the covariance matrix, I'm able to get the pdf of a specific point x with public double density(double[] vals)
, like the description is saying
Returns the probability density function (PDF) of this distribution evaluated at the specified point x.
In this approach I'm converting my complex area in an ArrayList of Points and subsequently summing up all the densities by iterating over the ArrayList like this:
MultivariateNormalDistribution mnd = new MultivariateNormalDistribution(means, covariances);
double sum = 0.0;
for(Point p : complexArea) {
double[] pos = {p.x, p.y};
sum += mnd.density(pos);
}
return sum;
But I've encountered a problem with lacking precision when setting the standard deviations to really low values so that the pdf is containing peaks > 1 at the position I'm calling mnd.density(pos)
. So the sum is adding up to values > 1.
To avoid these peaks I'm trying to sum up the average of a summed up value which are the surrounding points in double precision of the current point by
MultivariateNormalDistribution mnd = new MultivariateNormalDistribution(means, covariances);
double sum = 0.0;
for(Point p : surfacePoints) {
double tmpRes = 0.0;
for(double x = p.x - 0.5; x < p.x + 0.5; x+=0.1) {
for(double y = p.y - 0.5; y < p.y + 0.5; y+=0.1) {
double[] pos = {x, y};
tmpRes += mnd.density(pos);
}
}
sum += tmpRes / 100.0;
}
return sum;
which obviously works.
All in all I'm not quite sure if my approaches are fundamentally correct. Another approach would be to compute the probability with numerical integration but I'm clueless how to achieve this in java.
Are there any other possibilities to achieve this?
EDIT:
Beside the fact of lacking accuracy, the main question is: Is the second approach "summing up the densities" a valid method to obtain a probability in an area of a bivariate normal distribution? Thinking about 1-dimensional normal distributions, the probability of one specific point is always 0. How does the public double density(double[] vals)
method in the apache math library obtain a valid value?