python - 通过反向传播对贝叶斯使用局部重新参数化技巧时正确的损失函数

Question

我正在尝试根据Blundell 等人给出的反向传播方法的贝叶斯实现一个简单的线性层：Weight Uncertainty in Neural Networks。

首先，我使用了常规的重新参数化技巧，对权重分布进行采样，然后根据公式计算损失（如论文中所述）：

loss = 1/M [log(q(w|theta)-log(P(w))] - log(P(Di|Wi))

在哪里

M = number of batches
log(q(w|theta)) = log of the variational posterior
log(P(w)) = log of the prior
log(P(Di|wi)) = log of likelihood

我的实现——以shishir13sharma为导向——因此如下所示：

# calculate gaussian distribution needed to compute variational posterior
weight_log_gaussian_distr = (-math.log(math.sqrt(2 * math.pi)) - torch.log(self.weight_sigma) - ((self.weight - self.weight_mu) ** 2) / (2 * self.weight_sigma ** 2)).sum()
bias_log_gaussian_distr = (-math.log(math.sqrt(2 * math.pi)) - torch.log(self.bias_sigma) - ((self.bias - self.bias_mu) ** 2) / (2 * self.bias_sigma ** 2)).sum()
# calculate variational posterior
self.log_variational_posterior = weight_log_gaussian_distr + bias_log_gaussian_distr
# calculate gaussian prior using scale mixture priors (implemented separately)
self.log_prior = self.weight_prior.log_mixed_prior(self.weight) + self.bias_prior.log_mixed_prior(self.bias)

但是，建议使用局部重新参数化技巧而不是常规重新参数化，因为（如果我没记错的话）它会导致更稳定和准确的优化过程。局部重新参数化意味着不是对权重进行采样，而是使用权重的平均值计算激活值，然后对激活的分布执行采样。

我的问题是：

我注意到其他实现（例如kumar-shridhar或ThirstyScholar）没有使用 Blundell 等人提出的比例混合先验方法。而是使用了封闭形式的变分自动编码器损失。然而，一些项目使用权重分布和先验之间的 KL 散度 (KL(q(w|theta)||P(w))，而另一些项目则使用激活分布 (KL(q(act|theta)| |P(w))。

当使用局部重新参数化技巧时，我是否必须对 KL 散度使用权重分布或激活分布，并且在局部重新参数化技巧中使用 Blundell 的比例混合先验方法是否有意义？

python - 通过反向传播对贝叶斯使用局部重新参数化技巧时正确的损失函数

0 回答 0

Related

Reference