我正在将来自 Stan 的 ARM 书中的选定模型“翻译”到 pymc3(我希望尽快将它们发布到 Github 上),并且我有一个关于“不正确的先验”的问题。
我知道 Stan 默认是对参数使用统一的先验。如果无界,这意味着统一(-inf,+inf)
我的问题是:无论如何在pymc3中指定这样的先验?
这是一个示例来说明问题以及我到目前为止所做的尝试。
import numpy as np
import pymc3 as pm
light_speed <- np.array(28, 26, 33, 24, 34, -44, 27, 16, 40, -2, 29, 22, 24, 21, 25,
30, 23, 29, 31, 19, 24, 20, 36, 32, 36, 28, 25, 21, 28, 29,
37, 25, 28, 26, 30, 32, 36, 26, 30, 22, 36, 23, 27, 27, 28,
27, 31, 27, 26, 33, 26, 32, 32, 24, 39, 28, 24, 25, 32, 25,
29, 27, 28, 29, 16, 23)
在斯坦(pystan)
#the model in pystan as specified in stan arm examples
model_string = '''
data {
int<lower=0> N;
vector[N] y;
}
parameters {
vector[1] beta;
real<lower=0> sigma;
}
model {
y ~ normal(beta[1],sigma);
}
'''
# Stan object
StanDSO = ps.StanModel(model_code = model_string)
# data
data = dict(N = len(light_speed),
y = light_speed
)
#fit and verify model results
fit_model = StanDSO.sampling(data=data, iter = 5000, chains = 2, thin = 1)
print fit_model
在pymc3
model_1= pm.Model()
with model_1:
#priors as specified in stan model
mu = pm.Uniform('mu', lower = -np.inf, upper= np.inf)
sigma = pm.Uniform('sigma', lower = 0, upper= np.inf)
#using vague priors works
#mu = pm.Uniform('mu', lower = light_speed.std()/1000.0, upper= light_speed.std()*1000.0)
#sigma = pm.Uniform('sigma', lower = light_speed.std()/1000.0, upper= light_speed.std()*1000.0)
# define likelihood
y_obs = pm.Normal('Y_obs', mu = mu, sd = sigma, observed = light_speed)
# inference fitting the model
# I have to use slice because the following command
#trace = pm.sample(5000)
# produce the error
# ValueError: Cannot construct a ufunc with more than 32 operands
#(requested number were: inputs = 51 and outputs = 1)valueerror
xstart = pm.find_MAP()
xstep = pm.Slice()
trace = pm.sample(5000, xstep, xstart, random_seed = 123, progressbar= True)
pm.summary(trace)
我查看了 glm 的源代码,它似乎默认使用模糊的先验,Krutschke 和许多 BUGS 示例也推荐这种做法并且它有效(见上文)。然而,立场参考手册(第 52 页)表示,他们积极劝阻用户不要使用默认比例先验,因为他们将太多的概率质量集中在合理的后验值之外(...)可能会产生倾斜后验的深远影响。