tensorflow - Inception 网络 BatchNorm 层返回 None 梯度

Question

嗨，我正在尝试使用自定义的损失函数来微调初始网络。它是一个三元组损失函数。

这个函数来自 facenet.py

def triplet_loss(value, alpha):
    """Calculate the triplet loss according to the FaceNet paper

    Args:
      value: the embeddings for the anchor, positive, negative images.

    Returns:
      the triplet loss according to the FaceNet paper as a float tensor.
    """
    # The following function ensuer, it is evenly divided
    anchor, positive, negative = tf.split(value, num_or_size_splits=3, axis=0)

    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)

        basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
        # TODO: added by me
        tf.add_to_collection('losses', loss)
    return loss

注意：value参数是softmax之前logits层的输出。

当我计算梯度时，我发现BatchNorm/moving_variance并BatchNorm/moving_variance没有梯度。为什么它返回 None 梯度值？

通过可视化，我发现没有从损失到 BatchNorm 范围的数据流，为什么权重有来自损失节点的数据流而 Batchnorm 没有？

score 0 · Accepted Answer

这些无梯度只属于batchNorm层，因此我对batchNorm做了一些研究。看完博文后http://ruishu.io/2016/12/27/batchnorm/发现

批量归一化在训练与测试期间具有不同的行为。

训练

根据小批量统计数据标准化层激活。在训练步骤中，通过小批量统计数据的移动平均值更新人口统计近似值。

测试

根据估计的人口统计数据对层激活进行归一化。不要根据测试数据的小批量统计数据更新总体统计数据。

我在推理函数中设置了phase key作为训练后，问题就解决了。

tensorflow - Inception 网络 BatchNorm 层返回 None 梯度

这个函数来自 facenet.py

1 回答 1

批量归一化在训练与测试期间具有不同的行为。

Related

Reference