tensorflow - 张量流中的字符级双向语言模型

Question

受 Andrej Karpathy Char-RNN 的启发，有一个 char-rnn sherjilozair/char-rnn-tensorflow 的 Tensorflow 实现：使用 Tensorflow 在 Python 中用于字符级语言模型的多层循环神经网络（LSTM，RNN）。我想从此代码实现双向字符级语言模型。我更改了model.py并编写了一个简单的代码：

class Model:
def __init__(self, input_data, targets, seq_length=Config.max_seq_length, training=True):
    if Config.model == 'rnn':
        cell_fn = rnn.BasicRNNCell
    elif Config.model == 'gru':
        cell_fn = rnn.GRUCell
    elif Config.model == 'lstm':
        cell_fn = rnn.BasicLSTMCell
    elif Config.model == 'nas':
        cell_fn = rnn.NASCell
    else:
        raise Exception("model type not supported: {}".format(Config.model))

    fw_cells = []
    bw_cells = []
    for _ in range(Config.num_layers):
        fw_cell = cell_fn(Config.rnn_size)
        bw_cell = cell_fn(Config.rnn_size)
        fw_cells.append(fw_cell)
        bw_cells.append(bw_cell)

    self.fw_cell = rnn.MultiRNNCell(fw_cells, state_is_tuple=True)
    self.bw_cell = rnn.MultiRNNCell(bw_cells, state_is_tuple=True)

    self.input_data, self.targets = input_data, targets

    with tf.variable_scope('rnnlm'):
        softmax_w = tf.get_variable("softmax_w", [Config.rnn_size*2, Config.vocab_size])
        softmax_b = tf.get_variable("softmax_b", [Config.vocab_size])

    embedding = tf.get_variable("embedding", [Config.vocab_size, Config.rnn_size])
    inputs = tf.nn.embedding_lookup(embedding, self.input_data)

    inputs = tf.unstack(inputs, num=seq_length, axis=1)

    outputs, _, _ = tf.nn.static_bidirectional_rnn(self.fw_cell, self.bw_cell, inputs,
                                                   dtype=tf.float32, scope='rnnlm')
    output = tf.reshape(tf.concat(outputs, 1), [-1, Config.rnn_size*2])

    self.logits = tf.matmul(output, softmax_w) + softmax_b
    self.probs = tf.nn.softmax(self.logits)

    self.lr = tf.Variable(0.0, trainable=False)

    if training:
        loss = legacy_seq2seq.sequence_loss_by_example(
                [self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.sign(tf.cast(tf.reshape(self.targets, [-1]), dtype=tf.float32))])
        with tf.name_scope('cost'):
            self.cost = tf.reduce_mean(loss)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars), Config.grad_clip)

        with tf.name_scope('optimizer'):
            optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))

在训练阶段，我看到了快速收敛。经过近 3000 次迭代，损失达到 0.003。在测试阶段，所有字符的概率为 1.0。我认为有一个错误。我很高兴能得到一些帮助来找到我的错误。

score 0 · Accepted Answer

Looks like you set self.lr = tf.Variable(0.0, trainable=False). Try changing this to a nonzero value. If you are reading probabilities from self.probs during the testing phase this should be normalized appropriately,

score 0 · Accepted Answer

使用前后输出来预测当前单词的概率。在您的情况下，您使用当前 rnn 输出来预测当前单词的概率。

tensorflow - 张量流中的字符级双向语言模型

2 回答 2

Related

Reference