python - Tensorflow概率中多项式模型的规范

Question

我正在使用 Tensorflow Probability 中的混合多项式离散选择模型。该函数应在 3 个选项中选择一个输入。选择的替代方案由 CHOSEN（一个 #observationsx3 张量）指定。下面是代码的更新，以反映我在问题上的进展（但问题仍然存在）。

我目前收到错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6768,3] vs. [1,3,6768] [Op:Mul]

回溯表明问题出在调用 log_prob() 以获取联合分布的最终组件（即 tfp.Independent(tfp.Multinomial(...))

主要组件是（感谢 Padarn Wilson 帮助修复联合分布定义）：

@tf.function
def affine(x, kernel_diag, bias=tf.zeros([])):
  """`kernel_diag * x + bias` with broadcasting."""
  kernel_diag = tf.ones_like(x) * kernel_diag
  bias = tf.ones_like(x) * bias
  return x * kernel_diag + bias

def mmnl_func():
    adj_AV_train = (tf.ones(num_idx) - AV[:,0]) * tf.constant(-9999.)
    adj_AV_SM = (tf.ones(num_idx) - AV[:,1]) * tf.constant(-9999.)
    adj_AV_car = (tf.ones(num_idx) - AV[:,2]) * tf.constant(-9999.)

    return tfd.JointDistributionSequential([
        tfd.Normal(loc=0., scale=1e5),  # mu_b_time
        tfd.HalfCauchy(loc=0., scale=5),  # sigma_b_time
        lambda sigma_b_time,mu_b_time: tfd.MultivariateNormalDiag(  # b_time
        loc=affine(tf.ones([num_idx]), mu_b_time[..., tf.newaxis]),
        scale_diag=sigma_b_time*tf.ones(num_idx)),
        tfd.Normal(loc=0., scale=1e5), # a_train
        tfd.Normal(loc=0., scale=1e5), # a_car
        tfd.Normal(loc=0., scale=1e5), # b_cost
        lambda b_cost,a_car,a_train,b_time: tfd.Independent(tfd.Multinomial(
          total_count=1,
          logits=tf.stack([
              affine(DATA[:,0], tf.gather(b_time, IDX[:,0], axis=-1), (a_train + b_cost * DATA[:,1] + adj_AV_train)),
              affine(DATA[:,2], tf.gather(b_time, IDX[:,0], axis=-1), (b_cost * DATA[:,3] + adj_AV_SM)),
              affine(DATA[:,4], tf.gather(b_time, IDX[:,0], axis=-1), (a_car + b_cost * DATA[:,5] + adj_AV_car))
          ], axis=1)
        ),reinterpreted_batch_ndims=1)
    ])

@tf.function
def mmnl_log_prob(mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost):
    return mmnl_func().log_prob(
      [mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost, CHOICE])

# Based on https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args
# change constant values to tf.constant()
nuts_samples = tf.constant(1000)
nuts_burnin = tf.constant(500)
num_chains = tf.constant(1)
## Initial step size
init_step_size= tf.constant(0.3)
# Set the chain's start state.
initial_state = [
    tf.zeros([num_chains], dtype=tf.float32, name="init_mu_b_time"),
    tf.zeros([num_chains], dtype=tf.float32, name="init_sigma_b_time"),
    tf.zeros([num_chains, num_idx], dtype=tf.float32, name="init_b_time"),
    tf.zeros([num_chains], dtype=tf.float32, name="init_a_train"),
    tf.zeros([num_chains], dtype=tf.float32, name="init_a_car"),
    tf.zeros([num_chains], dtype=tf.float32, name="init_b_cost")
]

## NUTS (using inner step size averaging step)
##
@tf.function
def nuts_sampler(init):
    nuts_kernel = tfp.mcmc.NoUTurnSampler(
      target_log_prob_fn=mmnl_log_prob, 
      step_size=init_step_size,
      )
    adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
  inner_kernel=nuts_kernel,
  num_adaptation_steps=nuts_burnin,
  step_size_getter_fn=lambda pkr: pkr.step_size,
  log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
  step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
       )

    samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
  num_results=nuts_samples,
  current_state=initial_state,
  kernel=adapt_nuts_kernel,
  num_burnin_steps=tf.constant(100),
  parallel_iterations=tf.constant(5))
    return samples_nuts_, stats_nuts_

samples_nuts, stats_nuts = nuts_sampler(initial_state)

score 0 · Accepted Answer

这很可能是您的初始状态和链数的问题。您可以尝试在采样器调用之外初始化内核：

nuts_kernel = tfp.mcmc.NoUTurnSampler(
      target_log_prob_fn=mmnl_log_prob, 
      step_size=init_step_size,
      )
    adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
  inner_kernel=nuts_kernel,
  num_adaptation_steps=nuts_burnin,
  step_size_getter_fn=lambda pkr: pkr.step_size,
  log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
  step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
       )

然后做

nuts_kernel.bootstrap_results(initial_state)

并调查 logL 的形状，并返回提案状态。

另一件事是将您的初始状态输入您的对数似然/后验，并查看返回的对数似然的尺寸是否与您认为的匹配（如果您正在执行多个链，那么它可能应该返回 #chains对数可能性）。

据我了解，批处理维度（#chains）必须是所有矢量化计算中的第一个维度。

我关于 tensorflow 和自定义可能性的博客文章的最后一部分有一个示例的工作代码。

score 0 · Accepted Answer

我能够从我的模型中得到合理的结果。感谢大家的帮助！以下几点有助于解决各种问题。

使用 JointDistributionSequentialAutoBatched() 生成一致的批次形状。您需要安装 tf-nightly 才能访问。
超参数的更多信息先验。Multinomial() 分布中的指数变换意味着无信息的超参数（即 sigma = 1e5）意味着您很快就会有大量正数进入 exp()，从而导致无穷大。
设置步长等也很重要。
我找到了Christopher Suter 对 Tensorflow Probability 论坛上最近一个问题的回答，该问题指定了来自 STAN 的模型很有用。我利用从我的先验中获取样本作为初始似然参数有用的起点。
尽管 JointDistributionSequentialAutoBatched() 纠正了批次形状，但我返回并纠正了我的联合分布形状，以便打印 log_prob_parts() 给出一致的形状（即，10 个链的 [10,1]）。在不使用 JointDistributionSequentialAutoBatched() 的情况下，我仍然会出现形状错误，但组合似乎有效。
我将我的 affine() 分成两个函数。他们做同样的事情，但删除了回溯警告。基本上， affine() 能够广播输入，但它们不同，并且更容易编写两个函数来设置具有一致形状的输入。不同形状的输入会导致 Tensorflow 多次跟踪函数。

python - Tensorflow概率中多项式模型的规范

2 回答 2

Related

Reference