1

我正在尝试使用 PyMC 实现一个非常简单的大数定律示例。目标是生成不同大小样本的许多样本平均值。例如,在下面的代码中,我反复采集 5 个样本组 (samples_to_average = 5),计算它们的平均值,然后找到结果轨迹的 95% CI。

下面的代码运行,但我想做的是将 samples_to_average 修改为一个列表,以便我可以一次计算一系列不同样本大小的置信区间。

import scipy.misc
import numpy as np
import pymc as mc

samples_to_average = 5 
list_of_samples = mc.DiscreteUniform("response", lower=1, upper=10, size=1000)

@mc.deterministic
def sample_average(x=list_of_samples, n=samples_to_average):
    samples = int(n)
    selected = x[0:samples] 
    total = np.sum(selected) 
    sample_average = float(total) / samples 
    return sample_average 

def getConfidenceInterval():   
    responseModel = mc.Model([samples_to_average, list_of_samples, sample_average])
    mapRes = mc.MAP(responseModel)
    mapRes.fit() 
    mcmc = mc.MCMC(responseModel)
    mcmc.sample( 10000, 5000)
    upper = np.percentile(mcmc.trace('sample_average')[:],95)
    lower = np.percentile(mcmc.trace('sample_average')[:],5)
    return (lower, upper)     


print getConfidenceInterval()

我见过的大多数使用确定性装饰器的示例都使用全局随机变量。但是,为了实现我的目标,我认为我需要做的是在 getConfidenceInterval() 中创建一个随机变量(长度正确),并将其传递给 sample_average(而不是使用全局/默认参数提供 sample_average)。

如何将在 getConfidenceInterval() 中创建的变量传递给 sample_average(),或者,我可以使用不同的 samples_to_average 值评估多个模型的另一种方法是什么?如果可能的话,我想避免使用全局变量。

4

1 回答 1

2

Before addressing your question, I would like to simplify the way sample_average is written so that it is more compact and easier to understand.

sample_average = mc.Lambda('sample_average', lambda x=list_of_samples, n=samples_to_average: np.mean(x[:n]))

Now you can generalize this to the case where samples_to_average is an array of parameters:

samples_to_average = np.arange(5, 25, 5)

sample_average = mc.Lambda('sample_average', lambda x=list_of_samples, n=samples_to_average: [np.mean(x[:t]) for t in n])

The getConfidenceInterval function would also have to be changed as shown below:

def getConfidenceInterval():
    responseModel = mc.Model([samples_to_average, list_of_samples, sample_average])
    mapRes = mc.MAP(responseModel)
    mapRes.fit()
    mcmc = mc.MCMC(responseModel)
    mcmc.sample( 10000, 5000)
    average = np.vstack((t for t in mcmc.trace('sample_average')))
    upper = np.percentile(average, 95, axis = 0)
    lower = np.percentile(average, 5, axis = 0)
    return (lower, upper)

I used vstack to aggregate the sample averages into a 2D array and then used the axis option in Numpy's percentile function to compute percentiles along each column.

于 2013-11-09T16:56:16.677 回答