2

我有一个 Mathematica 代码,它计算从特定概率分布函数 (PDF) 获得的累积分布函数 (CDF) 的 95% 置信区间。PDF 很丑,因为它包含超几何 2F1 函数,我需要计算 15 个值的数据集的 2-sigma 误差线。

我想将此代码翻译成 Python,但我在值的后半部分得到了非常显着的分歧。

数学代码

results是 中的值的下限和上限 2-sigma 置信水平xdata。也就是说,xdata应该总是落在两个对应的results值之间。

navs  = {10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173,  5290, 8816};
freqs = {0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185,   0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166,   0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571}
xdata = {0.578064980346793,   0.030812200935204,   0.316777979844816,  
         0.353718150091612,   0.287659600326548,   0.269254388840293,  
         0.16545714457921,   0.138759871084825,    0.0602382519940077,   
         0.10120771961,  0.065311134782518,    0.105235790998594,   
         0.124642033979457,   0.0271909963701794,  0.0686653810421847};
data = MapThread[{#1, #2, #3} &, {navs, freqs, xdata}]

post[x_, n_, y_] = 
     (n - 1) (1 - x)^n (1 - y)^(n - 2) Hypergeometric2F1[n, n, 1, x*y]

integral = Map[(values = #; mesh = Subdivide[0, 1, 1000]; 
     Interpolation[
      DeleteDuplicates[{Map[
            SetPrecision[post[#, values[[1]], values[[3]]^2], 100] &, 
            mesh] // (Accumulate[#] - #/2 - #[[1]]/
               2) & // #/#[[-1]] &, 
         mesh}\[Transpose], (#1[[1]] == #2[[1]] &)], 
      InterpolationOrder -> 1]) &, data];

results = 
 MapThread[{Sqrt[#1[.025]], Sqrt[#1[0.975]]} &, {integral, data}]

{{0.207919, 0.776508}, {0.0481485, 0.535278}, {0.0834002, 0.574447}, 
{0.137742, 0.551035}, {0.121376, 0.455097}, {0.136889, 0.403306}, 
{0.0674029, 0.279408}, {0.0612534, 0.228762}, {0.0158357, 0.134521}, 
{0.0525374, 0.156055}, {0.0270589, 0.108861}, {0.0740978, 0.137691}, 
{0.100498, 0.149646}, {0.00741129, 0.0525161}, {0.0507748, 0.0850961}}

Python代码

这是我的翻译:results与以前的数量相同,截断到第 7 位以增加可读性。

我得到的results值开始偏离第 7 对值,并且最后四个点xdata落在两个对应值之间。results

import numpy as np
from scipy.integrate import cumtrapz
from scipy.interpolate import interp1d
from mpmath import *

mesh = list(np.linspace(0,1,1000));

navs = [10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173, 5290, 8816]
freqs = [0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185, 0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166, 0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571]
xdata = [0.578064980346793, 0.030812200935204, 0.316777979844816, 
0.353718150091612,0.287659600326548, 0.269254388840293,
0.16545714457921, 0.138759871084825, 0.0602382519940077, 
0.10120771961, 0.065311134782518, 0.105235790998594, 
0.124642033979457, 0.0271909963701794, 0.0686653810421847]

def post(x,n,y):
    post = (n-1)*((1-x)**n)*((1-y)**(n-2))*hyp2f1(n,n,1,x*y)
    return post

# setting the numeric precision to 100 as in Mathematica
# trying to get the most precise hypergeometric function values
mp.dps = 100
mp.pretty = True

results = []

for i in range(len(navs)):
    postprob = [];
    for j in range(len(mesh)):    
        posterior = post(mesh[j], navs[i], xdata[i]**2)
        postprob.append(posterior)
# calculate the norm of the pdf for integration
    norm = np.trapz(np.array(postprob),mesh);
# integrate pdf/norm to obtain cdf
    integrate = list(np.unique(cumtrapz(np.array(postprob)/norm, mesh, initial=0)));
    mesh2 = list(np.linspace(0,1,len(integrate)));
# interpolate inverse cdf to obtain the 2sigma quantiles
    icdf = interp1d(integrate, mesh2, bounds_error=False, fill_value='extrapolate');
    results.append(list(np.sqrt(icdf([0.025, 0.975]))))

results

[[0.2079198, 0.7765088], [0.0481485, 0.5352773], [0.0834, 0.5744489],
 [0.1377413, 0.5510352], [0.1218029, 0.4566994], [0.1399324, 0.4122767],
 [0.0733743, 0.3041607], [0.0739691, 0.2762597], [0.0230135, 0.1954886],
 [0.0871462, 0.2588804], [0.05637, 0.2268962],   [0.1731199, 0.3217401],
 [0.2665897, 0.3969059], [0.0315915, 0.2238736], [0.2224567, 0.3728803]]

感谢对这个问题的评论,我发现:

  • 超几何函数在两种语言中给出不同的结果。使用相同的输入值,我得到:在 MathematicaHypergeometric2F1中给了我结果1.0588267,而在 Python 中mpmath.hyp2f1给了1.0588866. 这是网格的第二个点,差在小数点后五位。

我找不到这个特殊功能的更好定义吗?

  • 我仍然不知道这仅仅是由于超几何函数还是由于积分方法,但这绝对是一个起点。

(我对 Python 比较陌生,可能代码有点幼稚)

4

0 回答 0