我想将 Python 代码复制到 R 代码中关于 Stick-break 过程,这是 Dirichlet Process 的构造方案之一。但是,我在 R 中绘制的图完全不同,因为 DP 样本分布不在基本分布 H 周围。
参考 Python 代码来自Austin Rochford 的博客。
from matplotlib import pyplot as plt
import numpy as np
import pymc3 as pm
import scipy.stats as ss
import seaborn as sns
from statsmodels.datasets import get_rdataset
from theano import tensor as T
np.random.seed(433)
N=20
K=30
alpha=50
H = ss.norm # base dist
beta = ss.beta.rvs(1,alpha, size=(N,K))
pi = np.empty_like(beta)
pi[:, 0] = beta[:,0]
pi[:, 1:] = beta[:, 1:] * (1-beta[:, :-1]).cumprod(axis=1)
omega = H.rvs(size=(N,K))
x_plot = np.linspace(-3,3,200)
sample_cdfs = (pi[..., np.newaxis]* np.less.outer(omega, x_plot)).sum(axis=1)
fig, ax = plt.subplots(figsize=(8,6))
ax.plot (x_plot, sample_cdfs[0],c="gray", alpha=0.75, label = "DP sample CDFs")
ax.plot(x_plot, sample_cdfs[1:].T, c="gray", alpha=0.75)
ax.plot(x_plot, H.cdf(x_plot), c= "k", label = "Base CDF")
ax.set_title(r'$\alpha = {}$'.format(alpha))
ax.legend(loc=2)
右侧的图是 Python 代码中的结果。
我试图将其转换为 R 代码:
library(yarrr)
N=20;K=30;ngrid=200;alpha=50
xgrid = seq(-3,3,length.out=ngrid)
betas = matrix(rbeta(N*K, 1, alpha),nr=N, nc=K)
stick.to.right = c(1, cumprod(1 - betas))[1:K]
pis.temp = stick.to.right * betas
omega = matrix(rnorm(N*K),nr=N,nc=K)
dirac = array(numeric(N*K*ngrid),dim=c(N,K,ngrid))
for(i in 1:N){
for(j in 1:K){
for(k in 1:ngrid){
dirac[i,j,k]=ifelse(omega[i,j]<xgrid[k],TRUE,FALSE)
}
}
}
pis = array(pis.temp,dim=c(N,K,200))
sample_cdfs = apply(pis* dirac,c(1,3),sum)
plot(xgrid,sample_cdfs[1,],col=piratepal("pony"),type="l",lwd=1,ylim=c(0,1))
for(i in 2:N) lines(xgrid,sample_cdfs[i,],col=piratepal("pony")[i])
lines(xgrid,pnorm(xgrid),lwd=2)
我画的图是 dp,alpha=50:
如何修改 R 代码以提供与 Python 代码相似的结果?