machine-learning - Tensorflow交叉熵NaN，改变学习率似乎没有影响

Question

TL;DR 尝试使用 tensorflow 构建用于序列标记的双向 RNN。

目标是输入“我喜欢纽约”并产生输出“OO LOC_START LOC”

该图编译并运行，但损失在 1 或 2 个批次后变为 NaN。我知道这可能是学习率的问题，但改变学习率似乎没有影响。目前使用 AdamOptimizer。

任何帮助，将不胜感激。

这是我的代码：

代码：

    # The input and output: a sequence of words, embedded, and a sequence of word classifications, one-hot
    self.input_x = tf.placeholder(tf.float32, [None, n_sequence_length, n_embedding_dim], name="input_x")
    self.input_y = tf.placeholder(tf.float32, [None, n_sequence_length, n_output_classes], name="input_y")

    # New shape: [sequence_length, batch_size (None), embedding_dim]
    inputs = tf.transpose(self.input_x, [1, 0, 2])

    # New shape: [sequence_length * batch_size (None), embedding_dim]
    inputs = tf.reshape(inputs, [-1, n_embedding_dim])

    # Define weights
    w_hidden = tf.Variable(tf.random_normal([n_embedding_dim, 2 * n_hidden_states]))
    b_hidden = tf.Variable(tf.random_normal([2 * n_hidden_states]))

    w_out = tf.Variable(tf.random_normal([2 * n_hidden_states, n_output_classes]))
    b_out = tf.Variable(tf.random_normal([n_output_classes]))

    # Linear activation for the input; this will make it fit to the hidden size
    inputs = tf.nn.xw_plus_b(inputs, w_hidden, b_hidden)

    # Split up the batches into a Python list
    inputs = tf.split(0, n_sequence_length, inputs)

    # Now we define our cell. It takes one word as input, a vector of embedding_size length
  cell_forward = rnn_cell.BasicLSTMCell(n_hidden_states, forget_bias=0.0)
  cell_backward = rnn_cell.BasicLSTMCell(n_hidden_states, forget_bias=0.0)

  # And we add a Dropout Wrapper as appropriate
  if is_training and prob_keep < 1:
        cell_forward = rnn_cell.DropoutWrapper(cell_forward, output_keep_prob=prob_keep)
        cell_backward = rnn_cell.DropoutWrapper(cell_backward, output_keep_prob=prob_keep)

    # And we make it a few layers deep
    cell_forward_multi = rnn_cell.MultiRNNCell([cell_forward] * n_layers)
    cell_backward_multi = rnn_cell.MultiRNNCell([cell_backward] * n_layers)

    # returns outputs = a list T of tensors [batch, 2*hidden]
    outputs = rnn.bidirectional_rnn(cell_forward_multi, cell_backward_multi, inputs, dtype=dtypes.float32)

    # [sequence, batch, 2*hidden]
    outputs = tf.pack(outputs)

    # [batch, sequence, 2*hidden]
    outputs = tf.transpose(outputs, [1, 0, 2])

    # [batch * sequence, 2 * hidden]
    outputs = tf.reshape(outputs, [-1, 2 * n_hidden_states])

    # [batch * sequence, output_classes]
    self.scores = tf.nn.xw_plus_b(outputs, w_out, b_out)

    # [batch * sequence, output_classes]
    inputs_y = tf.reshape(self.input_y, [-1, n_output_classes])

    # [batch * sequence]
    self.predictions = tf.argmax(self.scores, 1, name="predictions")

    # Now calculate the cross-entropy
    losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, inputs_y)
    self.loss = tf.reduce_mean(losses, name="loss")

    if not is_training:
        return

    # Training
    self.train_op = tf.train.AdamOptimizer(1e-4).minimize(self.loss)

    # Evaluate model
    correct_pred = tf.equal(self.predictions, tf.argmax(inputs_y, 1))
    self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name="accuracy")

score 0 · Accepted Answer

训练数据中是否存在标签有问题的示例？然后，当它遇到该示例时，成本变为 NaN。我之所以建议这样做，是因为当学习率为零且仅经过几批之后，它似乎仍然会发生。

这是我将如何调试：

将批量大小设置为 1
将学习率设置为 0.0
当您运行批处理时，张量流会输出中间值而不仅仅是成本
运行直到你得到一个 NaN，然后检查输入是什么，并通过检查中间输出确定在哪个点有一个 NaN

machine-learning - Tensorflow交叉熵NaN，改变学习率似乎没有影响

1 回答 1

Related

Reference