我无法弄清楚为什么返回的模型在第二层有 189 个参数。我计算它们的方式应该更多。为什么会这样?
代码如下:
# Define the prior weight distribution -- all N(0, 1) -- and not trainable
def prior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n) , scale_diag = tf.ones(n)
))
])
return(prior_model)
# Define variational posterior weight distribution -- multivariate Gaussian
def posterior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n) , dtype = dtype), # The parameters of the model are declared Variables that are trainable
tfpl.MultivariateNormalTriL(n) # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
# as the parameters of the Variational Dense Layer. That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
# by a mu and sigma learned from the data, independently of all the other weights. The output of this Variablelayer will become the input to the
# MultivariateNormalTriL object.
# The shape of the VariableLayer object will be defined by the number of parameters needed to create the MultivariateNormalTriL object given
# that it will live in a Space of n dimensions (event_size = n). This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
])
return(posterior_model)
# Create probabilistic regression with one hidden layer, weight uncertainty
model = Sequential([
tfpl.DenseVariational(units=8,
input_shape=(1,),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='sigmoid'),
tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0]),
tfpl.IndependentNormal(1)
])
def nll(y_true, y_pred):
return -y_pred.log_prob(y_true)
model.compile(loss=nll, optimizer=RMSprop(learning_rate=0.005))
model.summary()
当涉及到第二层时,我们有 8 个输入(假设我们在第一层有 8 个输出)和 2 个输出。所以我们总共有 16 个权重。每个都有其均值和方差 => 2 * 16 = 32 个参数。
然后我们必须计算32个参数之间的协方差矩阵中的自由参数。考虑到协方差矩阵的对称性,我们只考虑包含对角线的三角矩阵。所以我们有 (32**2 - 32)/2 + 32 = 528 个参数。但模型摘要仅报告 189 个参数。