0

对于一个非常简单的连续玩具模型,我使用 NUTS 在 pymc3 中得到了似乎不正确的后验。后验不同意分析计算和 Metropolis 后验。

在下面的代码中,我生成了具有固定随机种子的合成数据(因此结果是可重现的)。然后我在 pymc3 中定义了相同的生成模型,只观察最终数据。最后,我将其中一个潜在变量的边际分布与真正的分析后验和 Metropolis 后验进行比较。结果不同意。

#!/usr/bin/env python2
from __future__ import division

import numpy as np
import pymc3 as mc
import scipy as sci
import theano.tensor as th

np.random.seed(13)

n = 10
tau_scale = 2
tau0 = sci.stats.expon.rvs() * tau_scale
mu0 = np.random.randn(n) / np.sqrt(tau0)
x0 = mu0 + np.random.randn(n)

with mc.Model() as model1:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu = mc.Normal('mu', tau=tau, shape=(n,))
    mc.Normal('x', mu=mu, observed=x0)

with mc.Model() as model2:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu_z = mc.Normal('mu_z', shape=(n,))
    mu = mc.Deterministic('mu', mu_z / th.sqrt(tau))
    mc.Normal('x', mu=mu, observed=x0)


def infer(model):
    with model:
        map_ = mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b)
        step = mc.NUTS(scaling=map_)
        trace = mc.sample(100, step=step, start=map_, progressbar=False)
        step = mc.NUTS(scaling=trace[-1])
        return mc.sample(11000, step=step, start=trace[-1], progressbar=False)

trace1 = infer(model1)
trace2 = infer(model2)

with model2:
    trace3 = mc.sample(100000, step=mc.Metropolis(), progressbar=False,
                       start=mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b))

samples_tau1 = trace1['tau'][1000:]
samples_tau2 = trace2['tau'][1000:]
samples_tau3 = trace3['tau'][10000:]

print
print 'pymc3 version: ' + mc.__version__
print
print 'Model 1 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau1.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau1.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau1, 50))
print
print 'Model 2 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau2.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau2.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau2, 50))
print
print 'Model 2 Metropolis tau'
print 'Mean: {0:3.1f}'.format(samples_tau3.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau3.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau3, 50))

实际上,我以两种略有不同的方式定义了相同的生成模型。上述程序的输出如下。

deepee@entropy:~$ ./test_inference.py 
Applied log-transform to tau and added transformed tau_log to model.
Applied log-transform to tau and added transformed tau_log to model.

pymc3 version 3.0

Model 1 tau
Mean: 2.5
Standard Deviation: 1.6
Median 2.1

Model 2 tau
Mean: 4.0
Standard Deviation: 2.5
Median 3.4

Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median 2.9

tau 的真实后验平均值为 3.5,标准差为 2.3,中位数为 3.0,与 Metropolis 一致。使用Stan时,这些值也更加匹配。我正在使用 pymc3 的相对较新的提交 (ca40cd3b2)。

4

0 回答 0