2

我在玩变分自动编码器时偶然发现了一个奇怪的现象。这个问题描述起来很简单:

在为 VAE 定义损失函数时,您必须使用某种重构误差。我决定使用我自己的交叉熵实现,因为我无法使用 tensorflow 提供的任何函数获得合理的结果。它看起来像这样:

x_hat = tf.contrib.layers.fully_connected(fc2,
                                  input_dim,
                                  activation_fn=tf.sigmoid)

## Define the loss

reconstruction_loss = -tf.reduce_sum(
    x * tf.log(epsilon + x_hat) + 
    (1 - x) * tf.log(epsilon + 1 - x_hat),
    axis=1) 

它使用重构层的输出,该层应用 sigmoid 函数将其变为 [0; 1]范围。现在,我想在损失函数中应用 sigmoid 并将其更改为

x_hat = tf.contrib.layers.fully_connected(fc2,
                                  input_dim,
                                  activation_fn=None)

## Define the loss

reconstruction_loss = -tf.reduce_sum(
    x * tf.log(epsilon + tf.sigmoid(x_hat)) + 
    (1 - x) * tf.log(epsilon + 1 - tf.sigmoid(x_hat)),
    axis=1) 

我相信这应该提供几乎相同的结果。然而,在实践中,第二次尝试会导致奇怪的灰色图片。原件看起来也模糊且明亮得多。首先是好的版本,然后是替代的“错误”版本。

对于原始代码 第二次尝试

有人可以向我解释导致这种奇怪行为的原因吗?

如果你想自己测试,下面是我的源代码。您必须注释相应的块或注释以获得结果。谢谢!

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
import numpy as np

mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
n_samples = mnist.train.num_examples
input_dim = mnist.train.images[0].shape[0]
inter_dim = 256
encoding_dim = 5
epsilon = 1e-10
learning_rate = 1e-4
n_epochs = 20
batch_size = 100
width = 28

## Define the variational autoencoder model 

x = tf.placeholder(dtype=tf.float32,
               shape=[None, input_dim],
               name='x')

fc1 = tf.contrib.layers.fully_connected(x,
                                   inter_dim,
                                   activation_fn=tf.nn.relu)

z_mean = tf.contrib.layers.fully_connected(fc1,
                                       encoding_dim,
                                       activation_fn=None)
z_log_var = tf.contrib.layers.fully_connected(fc1,
                                          encoding_dim,
                                          activation_fn=None)

eps = tf.random_normal(shape=tf.shape(z_log_var),
                   mean=0,
                   stddev=1,
                   dtype=tf.float32)
z = z_mean + tf.exp(z_log_var / 2) * eps

fc2 = tf.contrib.layers.fully_connected(z,
                                    inter_dim,
                                    activation_fn=tf.nn.relu)

x_hat = tf.contrib.layers.fully_connected(fc2,
                                      input_dim,
                                      activation_fn=tf.sigmoid)
                                     #activation_fn=None)
## Define the loss

reconstruction_loss = -tf.reduce_sum(
    x * tf.log(epsilon + x_hat) + 
    (1 - x) * tf.log(epsilon + 1 - x_hat),
    axis=1) 

ALTERNATIVE LOSS W/ APPLYING SIGMOID, REMOVED ACTIVATION FROM OUTPUT LAYER
'''
reconstruction_loss = -tf.reduce_sum(
    x * tf.log(epsilon + tf.sigmoid(x_hat)) + 
    (1 - x) * tf.log(epsilon + 1 - tf.sigmoid(x_hat)),
    axis=1)
'''

KL_div = -.5 * tf.reduce_sum(
    1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var),
    axis=1)

total_loss = tf.reduce_mean(reconstruction_loss + KL_div)

## Define the training operator

train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(total_loss)

## Run it

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for epoch in range(n_epochs):
        for _ in range(n_samples // batch_size):
            batch = mnist.train.next_batch(batch_size)

            _, loss, recon_loss, KL_loss = sess.run([train_op,
                                                total_loss,
                                                reconstruction_loss,
                                                KL_div],
                                        feed_dict={x:batch[0]})
        print('[Epoch {}] loss: {}'.format(epoch, loss))
    print('Training Done')

    ## Reconstruct a few samples to validate the training

    batch = mnist.train.next_batch(100)

    x_reconstructed = sess.run(x_hat, feed_dict={x:batch[0]})

    n = np.sqrt(batch_size).astype(np.int32)
    I_reconstructed = np.empty((width*n, 2*width*n))
    for i in range(n):
        for j in range(n):
            x = np.concatenate(
                (x_reconstructed[i*n+j, :].reshape(width, width),
                 batch[0][i*n+j, :].reshape(width, width)),
                axis=1
            )
            I_reconstructed[i*width:(i+1)*width, j*2*width:(j+1)*2*width] = x

    fig = plt.figure()
    plt.imshow(I_reconstructed, cmap='gray')

EDIT1:解决方案

感谢@xdurch0,我意识到重建的输出不再通过 sigmoid 函数重新缩放。这意味着必须在绘制图像之前将 sigmoid 应用到图像上。只需修改输出:

x_reconstructed = sess.run(tf.sigmoid(x_hat), feed_dict={x:batch[0]})
4

0 回答 0