tensorflow - 将注意力层添加到编码器-解码器模型架构会产生更差的结果

Question

我最初为Next Phrase Prediction定义了一个编码器-解码器模型架构，并在一些数据上对其进行了训练，我成功地能够使用相同的模型进行预测。但是当我尝试在架构中插入注意力层时，模型训练是成功的，但我无法分别定义编码器和解码器模型进行预测。这是我定义的新模型架构：

# Model architecture along with Attention Layer
# Create the Encoder layers first.
encoder_inputs = Input(shape=(len_input,))
encoder_emb = Embedding(input_dim=vocab_in_size, output_dim=embedding_dim)

# Bidirectional LSTM or Simple LSTM
encoder_lstm = Bidirectional(LSTM(units=units, return_sequences=True, return_state=True)) # Bidirectional(
encoder_out, fstate_h, fstate_c, bstate_h, bstate_c = encoder_lstm(encoder_emb(encoder_inputs))
state_h = Concatenate()([fstate_h,bstate_h])
state_c = Concatenate()([bstate_h,bstate_c])

encoder_states = [state_h, state_c]

# Now create the Decoder layers.
decoder_inputs = Input(shape=(None,))
decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim)
decoder_lstm = LSTM(units=units*2, return_sequences=True, return_state=True) # units=units*2
decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states)

# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm_out])

# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_lstm_out, attn_out])

# Two dense layers
decoder_d1 = TimeDistributed(Dense(units, activation="relu"))
decoder_d2 = TimeDistributed(Dense(vocab_out_size, activation="softmax"))
decoder_out = decoder_d2(Dropout(rate=.2)(decoder_d1(Dropout(rate=.2)(decoder_concat_input))))
#decoder_out = decoder_d2(Dropout(rate=.2)(decoder_concat_input))

# combining the encoder and the decoder layers together
model = Model(inputs = [encoder_inputs, decoder_inputs], outputs= decoder_out)

model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy", metrics=['sparse_categorical_accuracy'])
model.summary()

训练这个模型并使用相同的张量定义另一个编码器和解码器：

# Changed infmodel
# Create the encoder model from the tensors we previously declared, while training
encoder_model = Model(encoder_inputs, [encoder_out, state_h, state_c], name = 'Encoder')

# decoder model
# Generate a new set of tensors for our new inference decoder
state_input_h = Input(shape=(units*2,), name="state_input_h") # units*2 if Bidirectional LSTM else units*1
state_input_c = Input(shape=(units*2,), name="state_input_c") # units*2
inf_decoder_inputs = Input(shape=(len_input, units), name="inf_decoder_inputs")
# similar decoder model architecture with state from encoder model
decoder_res, decoder_h, decoder_c = decoder_lstm(decoder_emb(decoder_inputs),
                                                 initial_state=[state_input_h, state_input_c])

# Attention inference
attn_out_res, attn_states_res = attn_layer([inf_decoder_inputs, decoder_res])
# Concat attention input and decoder LSTM output
decoder_out_concat_res = Concatenate(axis=-1, name='concat_layer')([decoder_res, attn_out_res])

inf_decoder_out = decoder_d2(decoder_d1(decoder_out_concat_res))

# finalizing the deocder model
inf_model = Model(inputs=[decoder_inputs] + [inf_decoder_inputs,  state_input_h, state_input_c], 
                  outputs=[inf_decoder_out, decoder_h, decoder_c], name = 'Decoder')

模型训练后的结果变差了，我相信我的模型架构有些问题。在尝试了许多排列之后，我得到了这个架构。浏览模型架构一次。

tensorflow - 将注意力层添加到编码器-解码器模型架构会产生更差的结果

0 回答 0

Related

Reference