0

我无法弄清楚为什么返回的模型在第二层有 189 个参数。我计算它们的方式应该更多。为什么会这样?

代码如下:

# Define the prior weight distribution -- all N(0, 1) -- and not trainable

def prior(kernel_size, bias_size, dtype = None):
    
    n = kernel_size + bias_size
    
    prior_model = Sequential([
        
        tfpl.DistributionLambda(
        
            lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n)  ,  scale_diag = tf.ones(n)
                                                
                                                ))
        
    ])
    
    return(prior_model)

# Define variational posterior weight distribution -- multivariate Gaussian

def posterior(kernel_size, bias_size, dtype = None):
    
    n = kernel_size + bias_size
    
    posterior_model = Sequential([
        
        tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n)  , dtype = dtype),   # The parameters of the model are declared Variables that are trainable
        
        tfpl.MultivariateNormalTriL(n)  # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
                                        # as the parameters of the Variational Dense Layer.  That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
                                        # by a mu and sigma learned from the data, independently of all the other weights.  The output of this Variablelayer will become the input to the
                                        # MultivariateNormalTriL object.
                                        # The shape of the VariableLayer object will be defined by the number of parameters needed to create the MultivariateNormalTriL object given
                                        # that it will live in a Space of n dimensions (event_size = n).  This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
        
        
    ])
    
    return(posterior_model)

# Create probabilistic regression with one hidden layer, weight uncertainty

model = Sequential([
    tfpl.DenseVariational(units=8,
                          input_shape=(1,),
                          make_prior_fn=prior,
                          make_posterior_fn=posterior,
                          kl_weight=1/x_train.shape[0],
                          activation='sigmoid'),
    tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
                          make_prior_fn=prior,
                          make_posterior_fn=posterior,
                          kl_weight=1/x_train.shape[0]),
    tfpl.IndependentNormal(1)
])

def nll(y_true, y_pred):
    return -y_pred.log_prob(y_true)

model.compile(loss=nll, optimizer=RMSprop(learning_rate=0.005))
model.summary()

在此处输入图像描述

当涉及到第二层时,我们有 8 个输入(假设我们在第一层有 8 个输出)和 2 个输出。所以我们总共有 16 个权重。每个都有其均值和方差 => 2 * 16 = 32 个参数。

然后我们必须计算32个参数之间的协方差矩阵中的自由参数。考虑到协方差矩阵的对称性,我们只考虑包含对角线的三角矩阵。所以我们有 (32**2 - 32)/2 + 32 = 528 个参数。但模型摘要仅报告 189 个参数。

4

1 回答 1

0

8个输入和2 个输出,这意味着有8 * 2 = 16 个内核和2 个偏差,总共有18 个参数。每个参数都有自己的均值,因此我们得到18 个参数作为均值,还有一个(18 * 17)/2 + 18 = 171参数作为协方差矩阵的下半部分。因此,按照model.summary()的正确指示,总共将有171 + 18 = 189 个可训练参数。

于 2020-11-25T13:36:30.593 回答