tensorflow - 在 seq2seq 模型实施中需要建议

Question

我正在使用 tensorflow 实现 seq2seq 模型以进行文本汇总。对于编码器，我使用的是双向 RNN 层。编码层：

    def encoding_layer(self, rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_vocab_size, 
                   encoding_embedding_size,
                   source_sequence_length,
                   emb_matrix):

    embed = tf.nn.embedding_lookup(emb_matrix, rnn_inputs)

    stacked_cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])

    outputs, state = tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked_cells, 
                                                             cell_bw=stacked_cells, 
                                                             inputs=embed, 
                                                             sequence_length=source_sequence_length, 
                                                             dtype=tf.float32)

    concat_outputs = tf.concat(outputs, 2)

    return concat_outputs, state[0]

对于解码器，我使用注意机制。解码层：

    def decoding_layer_train(self, encoder_outputs, encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob, rnn_size, batch_size):
    """
    Create a training process in decoding layer 
    :return: BasicDecoderOutput containing training logits and sample_id
    """

    dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, 
                                             output_keep_prob=keep_prob)


    train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length)

    attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(rnn_size, encoder_outputs,
                                                               memory_sequence_length=target_sequence_length)

    attention_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism,
                                                         attention_layer_size=rnn_size/2)

    state = attention_cell.zero_state(dtype=tf.float32, batch_size=batch_size)
    state = state.clone(cell_state=encoder_state)

    decoder = tf.contrib.seq2seq.BasicDecoder(cell=attention_cell, helper=train_helper, 
                                              initial_state=state,
                                              output_layer=output_layer) 
    outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, impute_finished=True, maximum_iterations=max_summary_length)

    return outputs

现在，BasicDecoder 函数的初始状态需要 shape = (batch_size, rnn_size) 的状态。我的编码器输出 shape= (batch_size, rnn_size) 的两种状态（向前和向后）。

为了使它工作，我只使用一种编码器状态（前向状态）。所以，我想知道同时使用编码层的后向编码和前向编码的可能方法。我应该同时添加前向和后向状态吗？

PS - 解码器不使用双向层。

score 0 · Accepted Answer

如果您只想使用后向编码：

# Get only the last cell state of the backward cell
(_, _), (_, cell_state_bw) = tf.nn.bidirectional_dynamic_rnn(...)
# Pass the cell_state_bw as the initial state of the decoder cell
decoder = tf.contrib.seq2seq.BasicDecoder(..., initial_state=cell_state_bw, ...)

我建议你做什么：

# Get both last states
(_, _), (cell_state_fw, cell_state_bw) = tf.nn.bidirectional_dynamic_rnn(...)
# Concatenate the cell states together
cell_state_final = tf.concat([cell_state_fw.c, cell_state_bw.c], 1)
# Concatenate the hidden states together
hidden_state_final = tf.concat([cell_state_fw.h, cell_state_bw.h], 1)
# Create the actual final state
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
# Now you can pass this as the initial state of the decoder

但是，请注意，解码器单元的大小必须是编码器单元大小的两倍才能使第二种方法起作用。

score 0 · Accepted Answer

之前的回复中已经涵盖了大部分内容。

关于您关心的“我应该同时添加前向和后向状态吗？”，根据我的说法，我们应该同时使用编码器的两种状态。否则，我们没有使用经过训练的后向编码器状态。此外，“bidirectional_dynamic_rnn”应该有两层不同的 LSTM 单元：一层用于 FW 状态，另一层用于 BW 状态。

tensorflow - 在 seq2seq 模型实施中需要建议

2 回答 2

Related

Reference