1

我正在尝试使用 PyMC 将四个不同的医疗状况预测变量作为输入,并将它们结合起来,在给定预测变量子集“是的,该患者患有这种疾病”的情况下,得出患者患有该疾病的总体后验概率。

这个想法是从 beta 分布中为每个预测变量选择一个 theta(条件的总体比率)以及假阴性和假阳性率,然后使用贝叶斯定理计算边际概率和后验概率。我有一个包含 16 个观察值的数组,每个可能的预测变量组合一个(因为有 4 个预测变量,所以有 2**4 = 16 个不同的可能预测变量组合)。我将最后一组计数和边际概率输入多项分布,类似于在 PyMC 教程http://pymc-devs.github.io/pymc/tutorial的以下示例中如何将灾难数组与泊松分布一起使用.html

这是我编写的尝试执行此操作的代码:

from pymc import Multinomial, Beta, deterministic
from numpy import array

n = 4 # number of predictors

counts_array = array([2942808473, 17491655, 21576, 23189, 339805, 89159, 168214, 76044, 43138288, 530963, 22682, 22169, 462052, 129472, 2804257, 3454104]) # 16 counts - one count value for each possible permutation of predictors that detected medical condition
pred = array([[0,0,0,0],[1,0,0,0],[0,1,0,0],[1,1,0,0],[0,0,1,0],[1,0,1,0],[0,1,1,0],[1,1,1,0],[0,0,0,1],[1,0,0,1],[0,1,0,1],[1,1,0,1],[0,0,1,1],[1,0,1,1],[0,1,1,1],[1,1,1,1]]); # array of yes/no's from predictors for each value in counts_array above

theta = Beta('theta',1,2)

fn = fp = tn = tp = [0] * 4;
for i in range(0,n):
    fn[i] = Beta('fn' + str(i),1,2)
    fp[i] = Beta('fp' + str(i),1,2)
    tn[i] = 1 - fp[i]
    tp[i] = 1 - fn[i]

@deterministic(plot=False)
def margprobs():
    mp = [0] * 2**n; # initialize with vector of 2**n zeros
    for i in range(0,2**n):
        mp[i] = theta *\
            (fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
            (fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
            (fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
            (fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
            + (1-theta) *\
            (tn[0]**(1-pred[i][0])) * (fp[0]**pred[i][0]) *\
            (tn[1]**(1-pred[i][1])) * (fp[1]**pred[i][1]) *\
            (tn[2]**(1-pred[i][2])) * (fp[2]**pred[i][2]) *\
            (tn[3]**(1-pred[i][3])) * (fp[3]**pred[i][3]);
    return mp;

@deterministic(plot=False)
def postprobs():
    pp = [0] * 2**n; # initialize with vector of 2**n zeros
    for i in range(0,2**n):
        pp[i] = theta *\
            (fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
            (fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
            (fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
            (fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
            / margprobs[i];
    return pp;

counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)

当我运行它时,在计算计数时,我得到与最后一行有关的错误:

$ python test.py
Traceback (most recent call last):
  File "test.py", line 46, in <module>
    counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/distributions.py", line 3269, in __init__
    verbose=verbose, **kwds)
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in __init__
    if not isinstance(self.logp, float):
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
    raise ZeroProbability(self.errmsg)
pymc.Node.ZeroProbability: Stochastic counts's value is outside its support,
 or it forbids its parents' current values

显然,这个错误与 PyMC 不喜欢我提供给 Multinomial() 的某些值有关,但我不确定哪个是错误的。我相信值应该是 counts_array (我观察到的计数值),n 应该是 16,因为我想选择一个包含 16 个项目的数组进行计数,每个可能的预测变量组合一个,p 应该是我的边际概率,并观察到应该是真的,因为我已经观察到了值。

我究竟做错了什么?

编辑:如果有帮助,我之前在 R2jags 中使用以下代码执行此操作:

model {
    theta ~ dbeta(1,2); # draw theta from a beta distribution
    for (i in 1:N) { # draw an false positive and false negative rate for each of N predictors
        fp[i] ~ dbeta(1,2);
        fn[i] ~ dbeta(1,2);
        tp[i] <- 1-fn[i]; # true positive = 1 - false negative rate
        tn[i] <- 1-fp[i]; # true negative rate = 1 - false positive rate
    }
    for (j in 1:M) {
    # Bayes theorem, i.e.
    # posterior probability =
    # P(A) * P(B|A) /
    # /
    # P(A) * P(B|A) + P(-A) * P(B|-A)  # <--- last line is marginal probability
    #
    # x is a vector of 1's and 0's indicating whether the ith predictor said yes or no
    margprobs[j] <- (theta *
                        (fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
                        (fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
                     (fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
                     (fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
                 + (1-theta) *
                        (tn[1]^(1-x[j,1])) * (fp[1]^x[j,1]) *
                     (tn[2]^(1-x[j,2])) * (fp[2]^x[j,2]) *
                     (tn[3]^(1-x[j,3])) * (fp[3]^x[j,3]) *
                     (tn[4]^(1-x[j,4])) * (fp[4]^x[j,4]));

    postprobs[j] <- theta *
                        (fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
                        (fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
                        (fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
                        (fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
                        / margprobs[j];


    }
    counts ~ dmulti(margprobs, total);
}
4

0 回答 0