tensorflow - 恢复模型时使用批量规范？

Question

我在 tensorflow 中恢复模型时使用批处理规范有一个小问题。

以下是我的批量规范，从这里开始：

def _batch_normalization(self, input_tensor, is_training, batch_norm_epsilon, decay=0.999):
    """batch normalization for dense nets.

    Args:
        input_tensor: `tensor`, the input tensor which needed normalized.
        is_training: `bool`, if true than update the mean/variance using moving average,
                             else using the store mean/variance.
        batch_norm_epsilon: `float`, param for batch normalization.
        decay: `float`, param for update move average, default is 0.999.

    Returns:
        normalized params.
    """
    # actually batch normalization is according to the channels dimension.
    input_shape_channels = int(input_tensor.get_shape()[-1])

    # scala and beta using in the the formula like that: scala * (x - E(x))/sqrt(var(x)) + beta
    scale = tf.Variable(tf.ones([input_shape_channels]))
    beta = tf.Variable(tf.zeros([input_shape_channels]))

    # global mean and var are the mean and var that after moving averaged.
    global_mean = tf.Variable(tf.zeros([input_shape_channels]), trainable=False)
    global_var = tf.Variable(tf.ones([input_shape_channels]), trainable=False)

    # if training, then update the mean and var, else using the trained mean/var directly.
    if is_training:
        # batch norm in the channel axis.
        axis = list(range(len(input_tensor.get_shape()) - 1))
        batch_mean, batch_var = tf.nn.moments(input_tensor, axes=axis)

        # update the mean and var.
        train_mean = tf.assign(global_mean, global_mean * decay + batch_mean * (1 - decay))
        train_var = tf.assign(global_var, global_var * decay + batch_var * (1 - decay))
        with tf.control_dependencies([train_mean, train_var]):
            return tf.nn.batch_normalization(input_tensor,
                                             batch_mean, batch_var, beta, scale, batch_norm_epsilon)
    else:
        return tf.nn.batch_normalization(input_tensor,
                                         global_mean, global_var, beta, scale, batch_norm_epsilon)

我训练模型并使用tf.train.Saver(). 下面是测试代码：

def inference(self, images_for_predict):
    """load the pre-trained model and do the inference.

    Args:
        images_for_predict: `tensor`, images for predict using the pre-trained model.

    Returns:
        the predict labels.
    """

    tf.reset_default_graph()
    images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)

    predictions = []
    correct = 0
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        # saver = tf.train.import_meta_graph('./models/dense_nets_model/dense_nets.ckpt.meta')
        # saver.restore(sess, tf.train.latest_checkpoint('./models/dense_nets_model/'))
        saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
        for i in range(100):
            pred, corr = sess.run([tf.argmax(prediction, 1), accuracy],
                                  feed_dict={
                                      images: [images_for_predict.images[i]],
                                      labels: [images_for_predict.labels[i]]})
            correct += corr
            predictions.append(pred[0])
    print("PREDICTIONS:", predictions)
    print("ACCURACY:", correct / 100)

但是预测结果总是很糟糕，就像这样：

('PREDICTIONS:', [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

('ACCURACY:', 0.080000000000000002)

一些提示：images_for_predict = mnist.test并且该self._build_graph方法有两个参数：batch_size和is_training.

任何人都可以帮助我吗？

score 9 · Accepted Answer

在尝试了很多方法后，我解决了这个问题，以下是我所做的。

首先感谢@gdelab，我tf.layers.batch_normalization改用了，所以我的批处理规范函数是这样的：

def _batch_normalization(self, input_tensor, is_training):
    return tf.layers.batch_normalization(input_tensor, training=is_training)

参数is_training是这样的占位符：is_training = tf.placeholder(tf.bool)

构建图表时，请记住在优化中添加此代码：

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
    train_step = tf.train.AdamOptimizer(self.learning_rate).minimize(cross_entropy)

因为tf.layers.batch_normalization更新均值和方差的添加不会自动添加为火车操作的依赖项 - 所以如果你不做任何额外的事情，它们永远不会运行。

所以开始训练网络，完成训练后，使用如下代码保存模型：

saver = tf.train.Saver(var_list=tf.global_variables())
savepath = saver.save(sess, 'here_is_your_personal_model_path')

请注意，var_list=tf.global_variables()参数确保 tensorflow 保存的所有参数包括设置为不可训练的全局均值/变量。

恢复和测试模型时，这样做：

# build the graph like training:
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
saver = tf.train.Saver()
saver.restore(sess, 'here_is_your_personal_model_path')

现在可以测试他/她的模型，希望它可以帮助你，谢谢！

score 4 · Accepted Answer

看到批量规范的实现，当您加载模型时，您需要保留构建的图images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)并加载检查点的权重值，而不是元图。我认为现在saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')也可以恢复元图（对不起，如果我错了），所以你只需要恢复它的“数据”部分。

否则，您只是使用该图进行训练，其中批处理规范中使用的均值和方差是从批处理中获得的。但是当您测试批次的大小为 1 时，因此通过批次的均值和方差进行归一化总是会使您的数据为 0，因此输出是恒定的。

在任何情况下，我都建议您使用tf.layers.batch_normalization一个is_training占位符，您需要将其提供给您的网络......

tensorflow - 恢复模型时使用批量规范？

2 回答 2

Related

Reference