tensorflow - 增加标签错误率（编辑距离）和波动损失？

Question

我正在训练这种架构的手写识别模型：

{
"network": [
{
"layer_type": "l2_normalize"
},
{
"layer_type": "conv2d",
"num_filters": 16,
"kernel_size": 5,
"stride": 1,
"padding": "same"
},
{
"layer_type": "max_pool2d",
"pool_size": 2,
"stride": 2,
"padding": "same"
},
{
"layer_type": "l2_normalize"
},
{
"layer_type": "dropout",
"keep_prob": 0.5
},
{
"layer_type": "conv2d",
"num_filters": 32,
"kernel_size": 5,
"stride": 1,
"padding": "same"
},
{
"layer_type": "max_pool2d",
"pool_size": 2,
"stride": 2,
"padding": "same"
},
{
"layer_type": "l2_normalize"
},
{
"layer_type": "dropout",
"keep_prob": 0.5
},
{
"layer_type": "conv2d",
"num_filters": 64,
"kernel_size": 5,
"stride": 1,
"padding": "same"
},
{
"layer_type": "max_pool2d",
"pool_size": 2,
"stride": 2,
"padding": "same"
},
{
"layer_type": "l2_normalize"
},
{
"layer_type": "dropout",
"keep_prob": 0.5
},
{
"layer_type": "conv2d",
"num_filters": 128,
"kernel_size": 5,
"stride": 1,
"padding": "same"
},
{
"layer_type": "max_pool2d",
"pool_size": 2,
"stride": 2,
"padding": "same"
},
{
"layer_type": "l2_normalize"
},
{
"layer_type": "dropout",
"keep_prob": 0.5
},
{
"layer_type": "conv2d",
"num_filters": 256,
"kernel_size": 5,
"stride": 1,
"padding": "same"
},
{
"layer_type": "max_pool2d",
"pool_size": 2,
"stride": 2,
"padding": "same"
},
{
"layer_type": "l2_normalize"
},
{
"layer_type": "dropout",
"keep_prob": 0.5
},
{
"layer_type": "collapse_to_rnn_dims"
},
{
"layer_type": "birnn",
"num_hidden": 128,
"cell_type": "LSTM",
"activation": "tanh"
}
],
"output_layer": "ctc_decoder"
}

训练 ctc 损失在第一个训练 epoch 突然下降，但在剩余的 epoch 中会出现平稳波动。标签错误率不仅波动，而且似乎并没有真正降低。

我应该提一下，每个样本的序列长度确实接近于最长 ground truth 的长度（即从 1024 开始，到它进入接近最长 ground truth 长度 21 的 ctc_loss 时变为 32）。

至于图像的预处理，我确保在调整大小时保持纵横比，并将图像向右填充~~以使其成为正方形~~，以便所有图像都有宽度并且手写文字将在左侧. 我还反转了图像的颜色，使得手写字符具有最高的像素值 (255) 和背景，而其余的具有最低的像素值 (0)。

预测是这样的。第一部分是一组随机字符串~~，最后是一堆零（这可能是由于填充而预期的）~~。

INFO:tensorflow:outputs = [[59 45 59 45 59 55 59 55 59 45 59 55 59 55 59 55 45 59  8 59 55 45 55  8
  45  8 45 59 45  8 59  8 45 59 45  8 45 19 55 45 55 45 55 59 45 59 45  8
  45  8 45 55  8 45  8 45 59 45 55 59 55 59  8 55 59  8 45  8 45  8 59  8
  59 45 59 45 59 45 59 45 59 45 59 45 19 45 55 45 22 45 55 45 55  8 45  8
  59 45 59 45 59 45 59 55  8 45 59 45 59 45 59 45 19 45 59 45 19 59 55 24
   4 52 54 55]]

以下是我如何将 cnn 输出折叠到 rnn dims：

def collapse_to_rnn_dims(inputs):
    batch_size, height, width, num_channels = inputs.get_shape().as_list()
    if batch_size is None:
        batch_size = -1
    time_major_inputs = tf.transpose(inputs, (2, 0, 1, 3))
    reshaped_time_major_inputs = tf.reshape(time_major_inputs,
                                            [width, batch_size, height * num_channels]
                                            )
    batch_major_inputs = tf.transpose(reshaped_time_major_inputs, (1, 0, 2))
    return batch_major_inputs

以下是我如何将 rnn 折叠为 ctc dims：

def convert_to_ctc_dims(inputs, num_classes, num_steps, num_outputs):
    outputs = tf.reshape(inputs, [-1, num_outputs])
    logits = slim.fully_connected(outputs, num_classes,
                                  weights_initializer=slim.xavier_initializer())
    logits = slim.fully_connected(logits, num_classes,
                                  weights_initializer=slim.xavier_initializer())
    logits = tf.reshape(logits, [num_steps, -1, num_classes])
    return logits

tensorflow - 增加标签错误率（编辑距离）和波动损失？

0 回答 0

Related

Reference