我在玩变分自动编码器时偶然发现了一个奇怪的现象。这个问题描述起来很简单:
在为 VAE 定义损失函数时,您必须使用某种重构误差。我决定使用我自己的交叉熵实现,因为我无法使用 tensorflow 提供的任何函数获得合理的结果。它看起来像这样:
x_hat = tf.contrib.layers.fully_connected(fc2,
input_dim,
activation_fn=tf.sigmoid)
## Define the loss
reconstruction_loss = -tf.reduce_sum(
x * tf.log(epsilon + x_hat) +
(1 - x) * tf.log(epsilon + 1 - x_hat),
axis=1)
它使用重构层的输出,该层应用 sigmoid 函数将其变为 [0; 1]范围。现在,我想在损失函数中应用 sigmoid 并将其更改为
x_hat = tf.contrib.layers.fully_connected(fc2,
input_dim,
activation_fn=None)
## Define the loss
reconstruction_loss = -tf.reduce_sum(
x * tf.log(epsilon + tf.sigmoid(x_hat)) +
(1 - x) * tf.log(epsilon + 1 - tf.sigmoid(x_hat)),
axis=1)
我相信这应该提供几乎相同的结果。然而,在实践中,第二次尝试会导致奇怪的灰色图片。原件看起来也模糊且明亮得多。首先是好的版本,然后是替代的“错误”版本。
有人可以向我解释导致这种奇怪行为的原因吗?
如果你想自己测试,下面是我的源代码。您必须注释相应的块或注释以获得结果。谢谢!
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
import numpy as np
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
n_samples = mnist.train.num_examples
input_dim = mnist.train.images[0].shape[0]
inter_dim = 256
encoding_dim = 5
epsilon = 1e-10
learning_rate = 1e-4
n_epochs = 20
batch_size = 100
width = 28
## Define the variational autoencoder model
x = tf.placeholder(dtype=tf.float32,
shape=[None, input_dim],
name='x')
fc1 = tf.contrib.layers.fully_connected(x,
inter_dim,
activation_fn=tf.nn.relu)
z_mean = tf.contrib.layers.fully_connected(fc1,
encoding_dim,
activation_fn=None)
z_log_var = tf.contrib.layers.fully_connected(fc1,
encoding_dim,
activation_fn=None)
eps = tf.random_normal(shape=tf.shape(z_log_var),
mean=0,
stddev=1,
dtype=tf.float32)
z = z_mean + tf.exp(z_log_var / 2) * eps
fc2 = tf.contrib.layers.fully_connected(z,
inter_dim,
activation_fn=tf.nn.relu)
x_hat = tf.contrib.layers.fully_connected(fc2,
input_dim,
activation_fn=tf.sigmoid)
#activation_fn=None)
## Define the loss
reconstruction_loss = -tf.reduce_sum(
x * tf.log(epsilon + x_hat) +
(1 - x) * tf.log(epsilon + 1 - x_hat),
axis=1)
ALTERNATIVE LOSS W/ APPLYING SIGMOID, REMOVED ACTIVATION FROM OUTPUT LAYER
'''
reconstruction_loss = -tf.reduce_sum(
x * tf.log(epsilon + tf.sigmoid(x_hat)) +
(1 - x) * tf.log(epsilon + 1 - tf.sigmoid(x_hat)),
axis=1)
'''
KL_div = -.5 * tf.reduce_sum(
1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var),
axis=1)
total_loss = tf.reduce_mean(reconstruction_loss + KL_div)
## Define the training operator
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(total_loss)
## Run it
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
for _ in range(n_samples // batch_size):
batch = mnist.train.next_batch(batch_size)
_, loss, recon_loss, KL_loss = sess.run([train_op,
total_loss,
reconstruction_loss,
KL_div],
feed_dict={x:batch[0]})
print('[Epoch {}] loss: {}'.format(epoch, loss))
print('Training Done')
## Reconstruct a few samples to validate the training
batch = mnist.train.next_batch(100)
x_reconstructed = sess.run(x_hat, feed_dict={x:batch[0]})
n = np.sqrt(batch_size).astype(np.int32)
I_reconstructed = np.empty((width*n, 2*width*n))
for i in range(n):
for j in range(n):
x = np.concatenate(
(x_reconstructed[i*n+j, :].reshape(width, width),
batch[0][i*n+j, :].reshape(width, width)),
axis=1
)
I_reconstructed[i*width:(i+1)*width, j*2*width:(j+1)*2*width] = x
fig = plt.figure()
plt.imshow(I_reconstructed, cmap='gray')
EDIT1:解决方案
感谢@xdurch0,我意识到重建的输出不再通过 sigmoid 函数重新缩放。这意味着必须在绘制图像之前将 sigmoid 应用到图像上。只需修改输出:
x_reconstructed = sess.run(tf.sigmoid(x_hat), feed_dict={x:batch[0]})