我在 pymc3 中实现了一个线性回归模型,其中未知的权重向量被限制为概率质量函数,因此被建模为 Dirichlet 分布,如下面的代码所示:
with pm.Model() as model:
#prior on precision of normal likelihood
tau = pm.Gamma('tau', alpha=1, beta=1)
phi = np.empty(ncountries, dtype=object)
y = np.empty((nyears-1, ncountries), dtype=object)
for icountry, country in enumerate(countries):
#prior Dirichlet allocation for each country
phi[icountry] = pm.Dirichlet('mix_{c}'.format(c=country),
np.roll(mix, icountry),
shape=ncountries)
for iyear, year in enumerate(years[1:]):
suffix = '_{y}-{c}'.format(y=year, c=country)
previous_pop = Xs[iyear, :]
#likelihood
y[iyear, icountry] = pm.Normal('obs' + suffix,
mu=pm.Deterministic(
'mu' + suffix,
dot(phi[icountry], previous_pop)),
tau=tau,
observed=Ys[iyear, icountry])
通过运行对后验进行采样后:
start = pm.find_MAP()
step = pm.Metropolis()
nsteps = 1000
trace = pm.sample(nsteps, step, start=start)
我分析了狄利克雷变量的踪迹,发现它们的值不相加(下面是一个例子):
array([[ 0.01029745, 0.00627394, 0.00996922, ..., 1.83955829,
0.00962185, 0.01020659],
[ 0.01029745, 0.00627394, 0.00996922, ..., 1.83955829,
0.00962185, 0.01020659],
[ 0.01029745, 0.00627394, 0.00996922, ..., 1.83955829,
0.00962185, 0.01020659],
...,
[ 0.02050308, 0.01685555, 0.01976797, ..., 1.92278065,
0.03956622, 0.00473735],
[ 0.01993214, 0.01632033, 0.01994876, ..., 1.92487858,
0.04078728, 0.00481424],
[ 0.01900882, 0.01528191, 0.02100671, ..., 1.92485693,
0.0395159 , 0.00524575]])
我不熟悉 theano 变量,并且发现很难探索 Dirichlet RV 在 pymc3 中的表达方式......我做错了什么,还是应该将跟踪中返回的值归一化以便它们总和为一个?
快速更新
看起来该函数pm.find_MAP()
采用了一种梯度下降优化。这没有考虑到表示从狄利克雷分布中抽取的向量是概率质量函数(其值应为正且它们的总和应为 1)这一事实所产生的约束。这种约束显然也没有在算法的采样阶段强制执行,并且随着似然分布的精度向零漂移,会导致收敛问题。