1

我想将 Python 代码复制到 R 代码中关于 Stick-break 过程,这是 Dirichlet Process 的构造方案之一。但是,我在 R 中绘制的图完全不同,因为 DP 样本分布不在基本分布 H 周围。

参考 Python 代码来自Austin Rochford 的博客

from matplotlib import pyplot as plt
import numpy as np
import pymc3 as pm
import scipy.stats as ss
import seaborn as sns
from statsmodels.datasets import get_rdataset
from theano import tensor as T
np.random.seed(433)
N=20    
K=30    
alpha=50
H = ss.norm # base dist
beta = ss.beta.rvs(1,alpha, size=(N,K))                          
pi = np.empty_like(beta)
pi[:, 0] = beta[:,0]
pi[:, 1:] = beta[:, 1:] * (1-beta[:, :-1]).cumprod(axis=1)
omega = H.rvs(size=(N,K))

x_plot = np.linspace(-3,3,200)            
sample_cdfs = (pi[..., np.newaxis]* np.less.outer(omega, x_plot)).sum(axis=1)

fig, ax = plt.subplots(figsize=(8,6))

ax.plot (x_plot, sample_cdfs[0],c="gray", alpha=0.75, label = "DP sample CDFs")
ax.plot(x_plot, sample_cdfs[1:].T, c="gray", alpha=0.75) 
ax.plot(x_plot, H.cdf(x_plot), c= "k", label = "Base CDF")

ax.set_title(r'$\alpha = {}$'.format(alpha))
ax.legend(loc=2)

右侧的图是 Python 代码中的结果。

α=50 的 DP

我试图将其转换为 R 代码:

library(yarrr)

N=20;K=30;ngrid=200;alpha=50
xgrid = seq(-3,3,length.out=ngrid)
betas = matrix(rbeta(N*K, 1, alpha),nr=N, nc=K)
stick.to.right = c(1, cumprod(1 - betas))[1:K]
pis.temp = stick.to.right * betas
omega = matrix(rnorm(N*K),nr=N,nc=K)

dirac = array(numeric(N*K*ngrid),dim=c(N,K,ngrid))

for(i in 1:N){
    for(j in 1:K){
        for(k in 1:ngrid){
            dirac[i,j,k]=ifelse(omega[i,j]<xgrid[k],TRUE,FALSE)
        }
    }
}

pis = array(pis.temp,dim=c(N,K,200))

sample_cdfs = apply(pis* dirac,c(1,3),sum)

plot(xgrid,sample_cdfs[1,],col=piratepal("pony"),type="l",lwd=1,ylim=c(0,1))
for(i in 2:N) lines(xgrid,sample_cdfs[i,],col=piratepal("pony")[i])
lines(xgrid,pnorm(xgrid),lwd=2)

我画的图是 dp,alpha=50:

α=50 的 DP

如何修改 R 代码以提供与 Python 代码相似的结果?

4

0 回答 0