java - 计算标准差的在线算法

Question

通常，我有一个更技术性的问题，但我会用一个数球的例子为你简化它。

假设我有不同颜色的球和为每种颜色保留的数组的一个索引（初始化为全 0）。每次我选择一个球，我都会将相应的索引增加 1。

球是随机挑选的，我一次只能挑选一个球。我唯一的目的是计算每种颜色的球数，直到我用完球。

我想计算不同颜色的球数量的标准偏差，同时我正在计算它们。在计算完所有球后，我不想通过再次遍历数组来计算它。

可视化：

随机排列的球：（BBGRRYYBBGGGGGGB每个字母代表颜色的第一个字母）从 0 到 3 的数组索引分别对应颜色 B、G、R 和 Y。当我完成挑选球时，我的阵列看起来像[5,7,2,2]。

在拥有最终数组后计算标准偏差非常简单，但我想在填充这个数组时这样做。

我想用 Java 做，我有大约 1000 种颜色。

实现它的最有效方法是什么？或者在拥有最终数组之前有没有办法做到这一点？

score 9 · Accepted Answer

您不需要数组来计算标准偏差。

只需跟踪点数、总和和总平方和。您可以随时计算平均值和标准差，而无需保留数组。

如果我了解您的要求，您将需要一个 Map ，其中颜色是键，统计实例是值。

这是一个为你做的课程。

package statistics;

/**
 * Statistics
 * @author Michael
 * @link http://stackoverflow.com/questions/11978667/online-algorithm-for-calculating-standrd-deviation/11978689#11978689
 * @since 8/15/12 7:34 PM
 */
public class Statistics {

    private int n;
    private double sum;
    private double sumsq;

    public void reset() {
        this.n = 0;
        this.sum = 0.0;
        this.sumsq = 0.0;
    }

    public synchronized void addValue(double x) {
        ++this.n;
        this.sum += x;
        this.sumsq += x*x;
    }

    public synchronized double calculateMean() {
        double mean = 0.0;
        if (this.n > 0) {
            mean = this.sum/this.n;
        }
        return mean;
    }

    public synchronized double calculateVariance() {
       double deviation = calculateStandardDeviation();
        return deviation*deviation;
    }

    public synchronized double calculateStandardDeviation() {
        double deviation = 0.0;
        if (this.n > 1) {
            deviation = Math.sqrt((this.sumsq - this.sum*this.sum/this.n)/(this.n-1));
        }
        return deviation;
    }
}

这是它的单元测试：

package statistics;

import org.junit.Assert;
import org.junit.Test;

/**
 * StatisticsTest
 * @author Michael
 * @link http://www.wolframalpha.com/input/?i=variance%281%2C+2%2C+3%2C+4%2C+5%2C+6%29&a=*C.variance-_*Variance-
 * @since 8/15/12 7:42 PM
 */
public class StatisticsTest {

    private static final double TOLERANCE = 1.0E-9;

    @Test
    public void testCalculateMean() {
        double [] values = new double[] {
            1.0, 2.0, 3.0, 4.0, 5.0, 6.0
        };
        Statistics stats = new Statistics();
        for (double value : values) {
            stats.addValue(value);
        }
        double expected = 3.5;
        Assert.assertEquals(expected, stats.calculateMean(), TOLERANCE);
    }

    @Test
    public void testCalculateVariance() {
        double [] values = new double[] {
                1.0, 2.0, 3.0, 4.0, 5.0, 6.0
        };
        Statistics stats = new Statistics();
        for (double value : values) {
            stats.addValue(value);
        }
        double expected = 3.5;
        Assert.assertEquals(expected, stats.calculateVariance(), TOLERANCE);
    }


    @Test
    public void testCalculateStandardDeviation() {
        double [] values = new double[] {
                1.0, 2.0, 3.0, 4.0, 5.0, 6.0
        };
        Statistics stats = new Statistics();
        for (double value : values) {
            stats.addValue(value);
        }
        double expected = Math.sqrt(3.5);
        Assert.assertEquals(expected, stats.calculateStandardDeviation(), TOLERANCE);
    }

}

score 1 · Accepted Answer

由于使用总和计算平均值和标准偏差，因此您可以轻松地为这些实现适当的累加器。然后，当您需要实际值时，完成其余的计算（特别是除法）。

平方和是棘手的部分，因为您为每个输入增加一个频率。解决这个问题的一种方法是维护到目前为止看到的每种颜色的计数（使用适当的数据结构）。然后，当您在输入中看到一种颜色时，您可以减去其先前的正方形并将新的正方形加回（或等效地将两个正方形的差添加到您的累加器中）。

我将把它留给读者来实现这里描述的算法。

java - 计算标准差的在线算法

2 回答 2

Related

Reference