我正在使用带有 theano 后端的 Keras 来解决本文中解决的在线手写识别问题:http://papers.nips.cc/paper/3213-unconstrained-on-line-handwriting-recognition-with-recurrent-neural-networks 。 .pdf _
我按照 Keras 图像 ocr 示例https://github.com/keras-team/keras/blob/master/examples/image_ocr.py修改了在线手写样本而不是图像样本的代码。在使用包含 200 个 epoch 的 842 个文本行的数据集进行训练时,每个 epoch 大约需要 6 分钟,CTC logloss 在第一个 epoch 之后减少,但在所有剩余的 epoch 中保持不变。我也尝试过不同的优化器(sgd、adam、adadelta)和学习率(0.01、0.1、0.2),但损失几乎没有任何变化。
x_train.shape=(842,1263,4) [842 个文本行,4 维 1263 个笔画点]
y_train.shape=(842,64) [842 个文本行,每行 64 个 max_len 个字符]
标签类型 (len_alphabet)= 66
代码快照:
size=x_train.shape[0]
trainable=True
inputs = Input(name='the_input', shape=x_train.shape[1:], dtype='float32')
rnn_encoded = Bidirectional(GRU(64, return_sequences=True),
name='bidirectional_1',
merge_mode='concat',trainable=trainable)(inputs)
birnn_encoded = Bidirectional(GRU(64, return_sequences=True),
name='bidirectional_2',
merge_mode='concat',trainable=trainable)(rnn_encoded)
output = TimeDistributed(Dense(66, activation='softmax'))(birnn_encoded)
y_pred = Activation('softmax', name='softmax')(output)
labels = Input(name='the_labels', shape=[max_len], dtype='int32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred,labels, input_length, label_length])
model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='Adadelta')
absolute_max_string_len=max_len
blank_label=len(alphabet)+1
labels = np.ones([size, absolute_max_string_len])
input_length = np.zeros([size, 1])
label_length = np.zeros([size, 1])
source_str = []
for i in range (x_train.shape[0]):
labels[i, :] = y_train[i]
input_length[i] = x_train.shape[1]
label_length[i] =len(y_train[i])
source_str.append('')
inputs_again = {'the_input': x_train,
'the_labels': labels,
'input_length': input_length,
'label_length': label_length,
'source_str': source_str # used for visualization only
}
outputs = {'ctc': np.zeros([size])}
model.fit(inputs_again, outputs, epochs=200,batch_size=25)
我的完整代码托管在这里:https ://github.com/aayushee/HWR/blob/master/Run/CTC.py 这些是模型和训练的截图: https ://github.com/aayushee/HWR /blob/master/Run/model.png https://github.com/aayushee/HWR/blob/master/Run/epochs.png
请建议是否需要修改模型架构,其他一些优化器会更好地解决这个问题,或者是否有其他东西可以解决这个问题。谢谢!