我有一个简单的 LSTM 网络,大致如下所示:
lstm_activation = tf.nn.relu
cells_fw = [LSTMCell(num_units=100, activation=lstm_activation),
LSTMCell(num_units=10, activation=lstm_activation)]
stacked_cells_fw = MultiRNNCell(cells_fw)
_, states = tf.nn.dynamic_rnn(cell=stacked_cells_fw,
inputs=embedding_layer,
sequence_length=features['length'],
dtype=tf.float32)
output_states = [s.h for s in states]
states = tf.concat(output_states, 1)
我的问题是。当我不使用激活 (activation=None) 或使用 tanh 时,一切正常,但是当我切换 relu 时,我不断收到“训练期间的 NaN 损失”,这是为什么呢?它是 100% 可重现的。