1

I'm trying to calculate the probability of a bivariate normal distribution over a specific area respectively a specific polygon in java.

The mathematical description would be to integrate the probability density function (pdf) of the bivariate normal distribution over a specific complex area.

My first approach was to use two NormalDistribution objects with the aid of the apache-commons-math library. Given dataset x for dimension 1 and dataset y for dimension 2 I've computed mean and standard deviation for each NormalDistribution.

With the method public double probability(double x0, double x1) from org.​apache.​commons.​math3.​distribution.​NormalDistribution I'm able to set an individual interval for each dimension, which means I can define a rectangular area and get the probability by

NormalDistribution normalX = new NormalDistribution(means[0], stdDeviation_x);
NormalDistribution normalY = new NormalDistribution(means[1], stdDeviation_y);

double probabilityOfRect = normalX.probability(x1, x2) * normalY.probability(y1, y2);

If the standard deviations are small enough and the defined region is large enough, the probability will approach to a number of 1.0 (0.99999999999), which is expected.

As I've said I need to compute a specific area, my first approach won't work this way because I'm only able to define rectangular areas.

So my second approach was to use the class MultivariateNormalDistribution, which is also implemented in apache-commons-math.

By using the MultivariateNormalDistribution with the vector means and the covariance matrix, I'm able to get the pdf of a specific point x with public double density(double[] vals), like the description is saying

Returns the probability density function (PDF) of this distribution evaluated at the specified point x.

http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/distribution/MultivariateNormalDistribution.html#density(double[])

In this approach I'm converting my complex area in an ArrayList of Points and subsequently summing up all the densities by iterating over the ArrayList like this:

MultivariateNormalDistribution mnd = new MultivariateNormalDistribution(means, covariances);
double sum = 0.0;
    for(Point p : complexArea) {
    double[] pos = {p.x, p.y};
    sum += mnd.density(pos);
}
return sum;

But I've encountered a problem with lacking precision when setting the standard deviations to really low values so that the pdf is containing peaks > 1 at the position I'm calling mnd.density(pos). So the sum is adding up to values > 1.

To avoid these peaks I'm trying to sum up the average of a summed up value which are the surrounding points in double precision of the current point by

MultivariateNormalDistribution mnd = new MultivariateNormalDistribution(means, covariances);
double sum = 0.0;
for(Point p : surfacePoints) {
    double tmpRes = 0.0;

    for(double x = p.x - 0.5; x < p.x + 0.5; x+=0.1) {
        for(double y = p.y - 0.5; y < p.y + 0.5; y+=0.1) {
            double[] pos = {x, y};
            tmpRes += mnd.density(pos);
        }
    }
    sum += tmpRes / 100.0;
}

return sum;

which obviously works.

All in all I'm not quite sure if my approaches are fundamentally correct. Another approach would be to compute the probability with numerical integration but I'm clueless how to achieve this in java.

Are there any other possibilities to achieve this?

EDIT: Beside the fact of lacking accuracy, the main question is: Is the second approach "summing up the densities" a valid method to obtain a probability in an area of a bivariate normal distribution? Thinking about 1-dimensional normal distributions, the probability of one specific point is always 0. How does the public double density(double[] vals) method in the apache math library obtain a valid value?

4

1 回答 1

3

您当前的方法是通过在具有整数坐标的点处采样来执行数值积分,将每个点的值分配给整个正方形。这有两个主要的错误来源。一是该功能在正方形内可能会有很大差异。另一个是边界,您可以在该区域中整合不完全包含在该区域中的正方形。第三个误差来源是舍入,但这很少是重要的,因为其他误差来源很大。

减少误差的一种简单方法是使用更精细的网格。如果您在坐标为整数除以 n 的点上采样(并乘以 1/n 的面积 n^-2 乘以 1/n 平方),这将减少两个误差源。一个问题是您在大约 n^2 的点上采样。

我建议将该区域的双积分写为积分的积分。

如果该区域是凸的,或者最坏的情况是在有限的积分列表上,则内部积分(例如,关于 x)将是一个区间上的一维高斯积分。您将限制在特定 y 坐标 y0 的 pdf 沿多边形与水平线 y=y0 的交点进行积分。您可以使用诸如erf 之类的函数评估内部积分,该函数在库中进行了数值近似,或者您可以使用一维数值积分自己进行。

外部积分(例如,相对于 y)自然地分解成碎片。在多边形有一点的地方,外积分内的函数可能不光滑。因此,通过多边形顶点的 y 坐标分解外部积分,并对每个区间进行梯形规则或辛普森规则等数值积分。这些要求您在每个区间的几个点评估内部积分并适当地加权它们。

这应该比简单地细化网格在给定的时间内产生更准确的结果。

于 2015-04-29T15:08:35.127 回答