tensorflow - 使用 elmo 嵌入和 keras 时，训练损失和验证损失没有减少

Question

我正在使用带有 keras 的 elmo 嵌入构建 LSTM 网络。我的目标是最小化 RMSE。使用以下代码段获得 elmo 嵌入：

def ElmoEmbedding(x):
    return elmo_model(inputs={
                            "tokens": tf.squeeze(tf.cast(x, tf.string)),
                            "sequence_len": tf.constant(batch_size*[max_len])
                      },
                      signature="tokens",
                      as_dict=True)["elmo"]

模型定义如下：

def create_model(max_len):
    input_text = Input(shape=(max_len,), dtype=tf.string)
    embedding = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(input_text)
    x = Bidirectional(LSTM(units=512, return_sequences=False,
                   recurrent_dropout=0.2, dropout=0.2))(embedding)
    out = Dense(1, activation = "relu")(x)

    model = Model(input_text, out)

    return model

模型编译为：

model.compile(optimizer = "rmsprop", loss = root_mean_squared_error,
          metrics =[root_mean_squared_error])

然后训练为：

model.fit(np.array(X_tr), y_tr, validation_data=(np.array(X_val), y_val),
                batch_size=batch_size, epochs=5, verbose=1)

root_mean_square_error 定义为：

def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))

我拥有的数据集大小是 9652，由句子组成，标签是一个数值。数据集分为训练集和验证集。最大句子长度为 142。我添加了填充 ( PAD ) 以使每个句子的长度为 142。因此，句子如下所示：

['france', 'is', 'hunting', 'down', 'its', 'citizens', 'who', 'joined', 'twins', 'without', 'trial', 'in', 'iraq']
['france', 'is', 'hunting', 'down', 'its', 'citizens', 'who', 'joined', 'twins', 'without', 'trial', 'in', 'iraq', '__PAD__', '__PAD__', '__PAD__',...., '__PAD__']

当我训练这个模型时，我得到以下输出

Train on 8704 samples, validate on 928 samples
Epoch 1/5
8704/8704 [==============================] - 655s 75ms/step - loss: 0.9960 - 
root_mean_squared_error: 0.9960 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 2/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 - 
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 3/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 - 
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 4/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 - 
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 5/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 - 
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389

从 epoch 2-5 开始，loss 和 metric 都没有改善并且保持不变。

我不确定这里有什么问题？任何帮助，将不胜感激。

tensorflow - 使用 elmo 嵌入和 keras 时，训练损失和验证损失没有减少

0 回答 0

Related

Reference