从方差:“总组的方差等于子组方差的均值加上子组均值的方差。” 我不得不读了好几遍,然后运行它:这个公式中的 464 == 464,所有数据的标准偏差——你想要的单个数字。
#!/usr/bin/env python
import sys
import numpy as np
N = 10
exec "\n".join( sys.argv[1:] ) # this.py N= ...
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
np.random.seed(1)
data = np.random.exponential( size=( N, 60 )) ** 5 # N rows, 60 cols
row_avs = np.mean( data, axis=-1 ) # av of each row
row_devs = np.std( data, axis=-1 ) # spread, stddev, of each row about its av
print "row averages:", row_avs
print "row spreads:", row_devs
print "average row spread: %.3g" % np.mean( row_devs )
# http://en.wikipedia.org/wiki/Variance:
# variance of the total group
# = mean of the variances of the subgroups + variance of the means of the subgroups
avvar = np.mean( row_devs ** 2 )
varavs = np.var( row_avs )
print "sqrt total variance: %.3g = sqrt( av var %.3g + var avs %.3g )" % (
np.sqrt( avvar + varavs ), avvar, varavs)
var_all = np.var( data ) # std^2 all N x 60 about the av of the lot
print "sqrt variance all: %.3g" % np.sqrt( var_all )
row averages: [ 49.6 151.4 58.1 35.7 59.7 48. 115.6 69.4 148.1 25. ]
row devs: [ 244.7 932.1 251.5 76.9 201.1 280. 513.7 295.9 798.9 159.3]
average row dev: 375
sqrt total variance: 464 = sqrt( av var 2.13e+05 + var avs 1.88e+03 )
sqrt variance all: 464
要查看组方差如何增加,请运行 Wikipedia Variance 中的示例。说我们有
60 men of heights 180 +- 10, exactly 30: 170 and 30: 190
60 women of heights 160 +- 7, 30: 153 and 30: 167.
平均标准开发是 (10 + 7) / 2 = 8.5 。虽然在一起,高度
-------|||----------|||-|||-----------------|||---
153 167 170 190
传播像 170 +- 13.2,远大于 170 +- 8.5。
为什么 ?因为我们不仅有男性 +- 10 和女性 +- 7 的价差,而且还有从 160 / 180 到共同平均值 170的价差。
练习:用两种方法计算价差 13.2,根据上面的公式,直接计算。