我正在用 numpy 生成一堆 N 正常 rvs(平均 0 sd 1),然后用 ddof = 1 取样本的标准偏差,这大概应该给我一个无偏的估计量。流程大致如下:
def genData(samples = 20, mean = 333.8, sd = 3.38):
bl = scipy.stats.norm.rvs(loc = mean, scale = sd, size = samples)
return [np.mean(bl), np.std(bl, ddof = 1)]
means = {}
sds = {}
n = 50000
for size in range(5,21):
x = [genData(size, mean = 0, sd = 1) for x in range(n)]
means[size] = map(lambda d: d[0], x)
sds[size] = map(lambda d: d[1], x)
但是,我改为观察以下 KDE:
ddof = 1 ddof = 2 由于样本量小,请原谅粗糙的曲线。
ddof = 1 有明显的偏差,ddof = 2 消除了偏差。我在这里做错了什么?