python - 我的 TensorFlow FCN 的权重都降到了 0

Question

我正在尝试在 TensorFlow 中实现完全卷积网络（5 层）。但是经过几次训练，我所有的 logits 都降到了 0。以前有人遇到过同样的问题吗？

这是我实现 CONV-ReLU-maxPOOL 层的方式：

def conv_relu_layer (in_data, nb_filters, filter_shape) :
    nb_in_channels = int (in_data_reshaped.shape[3])
    conv_shape = [filter_shape[0], filter_shape[1], 
                  nb_in_channels, nb_filters]

    weights = tf.Variable (
        tf.truncated_normal (conv_shape, mean=0., stddev=.05))
    bias    = tf.Variable (
        tf.truncated_normal ([nb_filters], mean=0., stddev=1.))

    output = tf.nn.conv2d (in_data_reshaped, weights,
                           [1,1,1,1], padding="SAME")
    output += bias
    output = tf.nn.relu (output)
    return output

def conv_relu_pool_layer (in_data, nb_filters, filter_shape, pool_shape,
                          pooling=tf.nn.max_pool) :
    conv_out = conv_relu_layer (in_data, nb_filters, filter_shape)
    ksize   = [1, pool_shape[0], pool_shape[1], 1] 
    strides = [1, pool_shape[0], pool_shape[1], 1]
    return pooling (conv_out, ksize=ksize, strides=strides, padding="SAME")

这是我的网络：

def create_network_5C (in_data, name="5C") :
    c1 = conv_relu_pool_layer (in_data, 64, [5,5], [2,2])
    c2 = conv_relu_pool_layer (c1,     128, [5,5], [2,2])
    c3 = conv_relu_pool_layer (c2,     256, [5,5], [2,2])
    c4 = conv_relu_pool_layer (c3,      64, [5,5], [2,2])
    return conv_relu_layer    (c4,       2, [5,5])

损失函数：

def loss (logits, labels, num_classes) :
    with tf.name_scope('loss'):
        logits = tf.reshape(logits, (-1, num_classes))
        epsilon = tf.constant(value=1e-4)
        labels = tf.to_float(tf.reshape(labels, (-1, num_classes)))

        softmax = tf.nn.softmax(logits) + epsilon

        cross_entropy = - tf.reduce_sum (
            tf.multiply (labels * tf.log (softmax), head),
            reduction_indices=[1])

        cross_entropy_mean = tf.reduce_mean (cross_entropy)
        tf.add_to_collection('losses', cross_entropy_mean)

        loss = tf.add_n(tf.get_collection('losses'))
    return loss

我的主要程序：

batch_size = 5
# Load data
x = tf.placeholder (tf.float32, [None, 416, 416, 3], name="x")
y = tf.placeholder (tf.float32, [None, 416, 416, 1], name="y")

# Contrast normalization and computation
x_gcn = tf.map_fn (lambda img : tf.image.per_image_standardization (img), x)
logits = create_network_5C (x_gcn)

# Having label at the same dimension as the output
y_p = tf.nn.avg_pool (tf.sign (y),
                      ksize=[1,16,16,1], strides=[1,16,16,1], padding="SAME")
y_rshp = tf.reshape (y_p, [batch_size, 416//16, 416//16])
y_bin = tf.cast (y_rshp > .5, tf.int32)
y_1hot = tf.one_hot (y_bin, 2)

# Compute error
error = loss (logits, y_1hot, 2)
optimizer = tf.train.AdamOptimizer (learning_rate=args.eta).minimize (error)

# Run the session
init_op = tf.global_variables_initializer ()
with tf.Session () as session :
    session.run (init_op)
    err, _ = session.run ([error, optimizer],
                           feed_dict={ x: image_batch,
                                       y: label_batch })

我注意到，如果我只将网络减少到 2 层，它不会将 logits 降至 0，但它也不会学到任何东西。如果我将它减少到 3 层，它将下降到 0，但经过多次迭代（而 5 层在几批中下降到 0）。

这可以与所谓的“梯度消失”联系起来吗？

如果相关，我的规范是：Ubuntu 16.04 - Python 3.6.4 - tensorflow 1.6.0

[编辑] 我的问题看起来真的像死 ReLU，如此处所述：StackOverflow：FCN 训练错误，但我的数据已标准化（介于 -2 和 +2 之间，并且我已经尝试更改平均值和 stddev 初始值我的权重和偏见

[编辑 2] 我尝试用 Leaky ReLU 或 softplus 替换 ReLU，在这两种情况下，logits 都停留在 0.1 以下，而 loss 保持在 0.6 和 0.7 之间

score 0 · Accepted Answer

0

使用一些泄漏的relu实际上就足够了，然后我只需要让他训练很长时间。

于 2018-04-30T08:40:05.267 回答

python - 我的 TensorFlow FCN 的权重都降到了 0

1 回答 1

Related

Reference