tensorflow - 在 MNIST 数据集上使用 Tensoroflow_Probability NOT 实现变分自动编码器

Question

我知道有很多与变分自动编码器相关的问题。但是，这个问题在两个方面与现有的问题不同：1）它是使用Tensforflow V2和Tensorflow_probability实现的；2）它不使用MNIST或任何其他图像数据集。

至于问题本身：

我正在尝试使用 Tensorflow_probability 和 Keras 实现 VAE。我想在一些合成数据集上对其进行训练和评估——作为我研究的一部分。我提供了下面的代码。

虽然实现已经完成并且在训练期间，损失值降低了，但是一旦我想在我的测试集上评估训练好的模型，我就会面临不同的错误。

我以某种方式确信该问题与输入/输出形状有关，但不幸的是我没有设法解决它。

这是代码：

import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split


tfd = tfp.distributions


n_epochs = 5
    n_features = 2
    latent_dim = 1
    n_units = 4
    learning_rate = 1e-3
    n_samples = 400
    batch_size = 32

    # Generate synthetic data / load data sets
    x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
                                     n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
                                     flip_y=0.01, class_sep=1.0, hypercube=True,
                                     shift=0.0, scale=1.0, shuffle=False, random_state=42)

    x_in = x_in.astype('float32')
    y_in = y_in.astype('float32')  # .reshape(-1, 1)

    x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
    x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)

    print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)

    prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)

    train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)

    valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)

    test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)

    encoder = tf.keras.Sequential([
        tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
        tfkl.Lambda(lambda x: tf.cast(x, tf.float32)),  # - 0.5
        tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
        tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
        tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
                   activation=None, name='mvn_triL1'),
        tfpl.MultivariateNormalTriL(
            # weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
            latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
    ])

    decoder = tf.keras.Sequential([
        tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
        # tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
        # tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
        tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
    ])

    vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')

    print("enoder:", encoder)
    print(" ")
    print("encoder.inputs:", encoder.inputs)
    print(" ")
    print(" encoder.outputs:",  encoder.outputs)
    print(" ")
    print("decoder:", decoder)
    print(" ")
    print("decoder:", decoder.inputs)
    print(" ")
    print("decoder.outputs:", decoder.outputs)
    print(" ")

    # negative log likelihood i.e the E_{S(eps)} [p(x|z)];
    # because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
    # this loss function takes two arguments, namely the original data points x, and the output of the model,
    # which we call it rv_x (because it is a random variable)
    negloglik = lambda x, rv_x: -rv_x.log_prob(x)

    vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
                loss=negloglik,)

    vae.summary()

    history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)

    print("x.shape:", x_test.shape)
    x_hat = vae(x_test)

    print("original:")
    print(x_test)
    print(" ")
    print("Decoded Random Samples:")
    print(x_hat.sample())
    print(" ")
    print("Decoded Means:")
    print(x_hat.mean())

问题：

使用上面的代码，我收到以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError：reshape 的输入是具有 80 个值的张量，但请求的形状有 160 [Op:Reshape]

据我所知，我们可以在解码器模型中在其输出层之前添加任意数量的层——因为它是通过卷积 VAE 完成的，对吗？
如果我在解码器中取消注释以下两行代码：

# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),

我看到以下警告和即将出现的错误：

警告：张量流：当最小化损失时，变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告：张量流：当最小化损失时，变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告：张量流：当最小化损失时，变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告：张量流：当最小化损失时，变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。

和错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError：reshape 的输入是具有 640 个值的张量，但请求的形状有 160 [Op:Reshape]

现在的问题是为什么在训练期间不使用解码器层，正如警告中提到的那样。

PS，我也尝试在训练和评估过程中直接通过x_train，x_valid，x_test，但没有帮助。

任何帮助都将不胜感激。

tensorflow - 在 MNIST 数据集上使用 Tensoroflow_Probability NOT 实现变分自动编码器

0 回答 0

Related

Reference