r - 当观察到的节点是潜在节点的最大值时使用 JAGS 或 STAN

Question

我有以下潜变量模型： Person j 有两个潜变量 X _j1和 X _j2。我们唯一能观察到的是它们的最大值，Y _j = max(X _j1 , X _j2 )。潜变量是二元正态的；它们每个都有均值 mu、方差 sigma ²，并且它们的相关性是 rho。我想仅使用 Y _{j估计三个参数（mu、sigma}²、rho），数据来自 n 个患者，j = 1,...,n。

我试图在 JAGS 中拟合这个模型（所以我将先验放在参数上），但我无法编译代码。这是我用来调用 JAGS 的 R 代码。首先，在给定参数的一些真实值的情况下，我生成数据（潜在变量和观察变量）：

# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7

# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)

然后我定义了 JAGS 模型，并编写了一个小函数来运行 JAGS 采样器并返回样本：

# JAGS model code
model.text <- '
model {
  for (i in 1:n) {
    Y[i] <- max(X[i,1], X[i,2]) # Ack!
    X[i,1:2] ~ dmnorm(X_mean, X_prec)
  }

  # mean vector and precision matrix for X[i,1:2]
  X_mean <- c(mu, mu)
  X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
  X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
  X_prec[1,2] <- X_prec[2,1]
  X_prec[2,2] <- X_prec[1,1]

  mu ~ dnorm(0, 1)
  sigma2 <- 1 / tau
  tau ~ dgamma(2, 1)
  rho ~ dbeta(2, 2)
}
'

# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
  require(rjags)
  if (!latent)
    model.text <- sub('\n *Y.*?\n', '\n', model.text)
  textCon <- textConnection(model.text)
  fit <- jags.model(textCon, data, n.adapt=n.adapt)
  close(textCon)
  update(fit, n.iter=n.burnin)
  coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}

最后，我调用 JAGS，只提供观察到的数据：

samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)

遗憾的是，这会导致错误消息：“Y[1] 是一个逻辑节点，无法观察到”。JAGS 不喜欢我使用“<-”为 Y[i] 赋值（我用“Ack!”表示违规行）。我理解投诉，但我不知道如何重写模型代码来解决这个问题。

此外，为了证明其他一切（除了“Ack！”行）都很好，我再次运行模型，但这次我将 X 数据输入它，假装它确实被观察到了。这运行得很好，我得到了很好的参数估计：

samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)

如果你能找到一种方法在 STAN 而不是 JAGS 中对这个模型进行编程，那对我来说很好。

score 3 · Accepted Answer

Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):

set.seed(2017-02-08)

# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7

# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)

model.text <- '
model {
  for (i in 1:n) {
    Y[i] ~ dsum(max_X[i])
    max_X[i] <- max(X[i,1], X[i,2])
    X[i,1:2] ~ dmnorm(X_mean, X_prec)
    ranks[i,1:2] <- rank(X[i,1:2])
    chosen[i] <- ranks[i,2]
  }

  # mean vector and precision matrix for X[i,1:2]
  X_mean <- c(mu, mu)
  X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
  X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
  X_prec[1,2] <- X_prec[2,1]
  X_prec[2,2] <- X_prec[1,1]

  mu ~ dnorm(0, 1)
  sigma2 <- 1 / tau
  tau ~ dgamma(2, 1)
  rho ~ dbeta(2, 2)

  #data# n, Y
  #monitor# mu, sigma2, rho, tau, chosen[1:10]
  #inits# X
}
'

library('runjags')

results <- run.jags(model.text)
results
plot(results)

Two things to note:

JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.

As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).

Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see: Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.

Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.

By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.

Matt

ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.

score 0 · Accepted Answer

我相信您可以用 Stan 语言对此进行建模，将可能性视为具有相同权重的两分量混合。Stan 代码可能看起来像

data {
  int<lower=1> N;
  vector[N] Y;
}
parameters {
  vector<upper=0>[2] diff[N];
  real mu;
  real<lower=0> sigma;
  real<lower=-1,upper=1> rho;
}
model {
  vector[2] case_1[N];
  vector[2] case_2[N];
  vector[2] mu_vec;
  matrix[2,2] Sigma;
  for (n in 1:N) {
    case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
    case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
  }
  mu_vec[1] = mu; mu_vec[2] = mu;
  Sigma[1,1] = square(sigma);
  Sigma[2,2] = Sigma[1,1];
  Sigma[1,2] = Sigma[1,1] * rho;
  Sigma[2,1] = Sigma[1,2];
  // log-likelihood
  target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma), 
                         multi_normal_lpdf(case_2 | mu_vec, Sigma));
  // insert priors on mu, sigma, and rho
}

r - 当观察到的节点是潜在节点的最大值时使用 JAGS 或 STAN

2 回答 2

Related

Reference