1

在 TensorFlow Probability ( v0.4.0) 和 PyTorch ( v0.4.1) 中,正态分布 ( tfp, PyTorch ) 和拉普拉斯分布 ( tfp, PyTorch ) 的 KL Divergence 都没有实现,从而导致NotImplementedError抛出错误。

>>> import tensorflow as tf
>>> import tensorflow_probability as tfp
>>> tfd = tfp.distributions
>>> import torch
>>>
>>> tf.__version__
'1.11.0'
>>> tfp.__version__
'0.4.0'
>>> torch.__version__
'0.4.1'
>>> 
>>> p = tfd.Normal(loc=0., scale=1.)
>>> q = tfd.Laplace(loc=0., scale=1.)
>>> tfd.kl_divergence(p, q)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda/envs/example/lib/python3.6/site-packages/tensorflow/python/ops/distributions/kullback_leibler.py", line 95, in kl_divergence
    % (type(distribution_a).__name__, type(distribution_b).__name__))
NotImplementedError: No KL(distribution_a || distribution_b) registered for distribution_a type Normal and distribution_b type Laplace
>>> 
>>> a = torch.distributions.normal.Normal(loc=0., scale=1.)
>>> b = torch.distributions.laplace.Laplace(loc=0., scale=1.)
>>> torch.distributions.kl.kl_divergence(a,b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda/envs/example/lib/python3.6/site-packages/torch/distributions/kl.py", line 161, in kl_divergence
    raise NotImplementedError
NotImplementedError

我假设这两个库都缺少这个,这是有充分理由的,并且用户应该tfp.distributions.RegisterKL在 TensorFlow Probability 和torch.distributions.kl.register_klPyTorch 中自己实现它。

这是正确的假设吗?如果是这样,有人可以解释为什么不对给定的分布类别实施 KL Divergence 吗?我想我错过了一些非常基本的东西。

如果我的假设是错误的,有人可以解释如何正确地让 TensorFlow 和 PyTorch 实现这些操作吗?

如需其他参考,在本示例中使用与 Edward 一起使用的旧版 TensorFlow,

pip install tensorflow==1.7
pip install edward

在上面这个最小的例子中,我试图在(或)中实现以下edward玩具示例代码的等价物。tfptorch

import tensorflow as tf
import edward as ed

p = ed.models.Normal(loc=0., scale=1.)
s = tf.Variable(1.)
q = ed.models.Laplace(loc=0., scale=s)
inference = ed.KLqp({p: q})
inference.run(n_iter=5000)
4

1 回答 1

2

IIRC, Edward's KLqp switches tries to use the Analytic form, and if not switches to using the sample KL.

For TFP, and I think PyTorch, kl_divergence only works for distributions registered, and unlike Edward only computes the analytic KL. As you mention, these aren't implemented in TFP, and I would say that's more of because the common cases (such as KL(MultivariateNormal || MultivariateNormal) have been implemented.

To register the KL divergence, you would do something like: https://github.com/tensorflow/probability/blob/07878168731e0f6d3d0e7c878bdfd5780c16c8d4/tensorflow_probability/python/distributions/gamma.py#L275. (It would be great if you could file a PR at https://github.com/tensorflow/probability!).

If it turns out that there isn't a suitable analytic form of this (off the top of my head, I don't know if there is one), then one can form the sample KL and do optimization with that. That can be done explicitly in TFP (by sampling and computing the sample KL. Also please file a PR if you would like this to be done more automatically as well. This is something some of us on TFP are interested in.

It would be interesting to see for what cases analytic KL's can be automated. For instance, if q and p come from the same exponential family, then there is a nice form for the KL divergence in terms of sufficient statistics and the normalizer. But for KL's that are across exponential families (or even not exponential families), I'm not aware of results on classes of distributions where you can calculate the KL within the class semi-automatically.

于 2018-10-30T04:42:05.147 回答