我知道有很多与变分自动编码器相关的问题。但是,这个问题在两个方面与现有的问题不同:1)它是使用Tensforflow V2和Tensorflow_probability实现的;2)它不使用MNIST或任何其他图像数据集。
至于问题本身:
我正在尝试使用 Tensorflow_probability 和 Keras 实现 VAE。我想在一些合成数据集上对其进行训练和评估——作为我研究的一部分。我提供了下面的代码。
虽然实现已经完成并且在训练期间,损失值降低了,但是一旦我想在我的测试集上评估训练好的模型,我就会面临不同的错误。
我以某种方式确信该问题与输入/输出形状有关,但不幸的是我没有设法解决它。
这是代码:
import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split
tfd = tfp.distributions
n_epochs = 5
n_features = 2
latent_dim = 1
n_units = 4
learning_rate = 1e-3
n_samples = 400
batch_size = 32
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
flip_y=0.01, class_sep=1.0, hypercube=True,
shift=0.0, scale=1.0, shuffle=False, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32') # .reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)
encoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
tfkl.Lambda(lambda x: tf.cast(x, tf.float32)), # - 0.5
tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
activation=None, name='mvn_triL1'),
tfpl.MultivariateNormalTriL(
# weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
])
decoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
])
vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')
print("enoder:", encoder)
print(" ")
print("encoder.inputs:", encoder.inputs)
print(" ")
print(" encoder.outputs:", encoder.outputs)
print(" ")
print("decoder:", decoder)
print(" ")
print("decoder:", decoder.inputs)
print(" ")
print("decoder.outputs:", decoder.outputs)
print(" ")
# negative log likelihood i.e the E_{S(eps)} [p(x|z)];
# because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
# this loss function takes two arguments, namely the original data points x, and the output of the model,
# which we call it rv_x (because it is a random variable)
negloglik = lambda x, rv_x: -rv_x.log_prob(x)
vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
loss=negloglik,)
vae.summary()
history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)
print("x.shape:", x_test.shape)
x_hat = vae(x_test)
print("original:")
print(x_test)
print(" ")
print("Decoded Random Samples:")
print(x_hat.sample())
print(" ")
print("Decoded Means:")
print(x_hat.mean())
问题:
- 使用上面的代码,我收到以下错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError:reshape 的输入是具有 80 个值的张量,但请求的形状有 160 [Op:Reshape]
据我所知,我们可以在解码器模型中在其输出层之前添加任意数量的层——因为它是通过卷积 VAE 完成的,对吗?
如果我在解码器中取消注释以下两行代码:
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
我看到以下警告和即将出现的错误:
警告:张量流:当最小化损失时,变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告:张量流:当最小化损失时,变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告:张量流:当最小化损失时,变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。警告:张量流:当最小化损失时,变量 ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] 不存在梯度。
和错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError:reshape 的输入是具有 640 个值的张量,但请求的形状有 160 [Op:Reshape]
现在的问题是为什么在训练期间不使用解码器层,正如警告中提到的那样。
PS,我也尝试在训练和评估过程中直接通过x_train,x_valid,x_test,但没有帮助。
任何帮助都将不胜感激。