tensorflow - 本文档中的 softmax_w 和 softmax_b 是什么？

Question

我是 TensorFlow 新手，需要训练语言模型，但在阅读文档时遇到了一些困难，如下所示。

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

我不明白为什么需要这条线，

logits = tf.matmul(output, softmax_w) + softmax_b

因为我了解到，一旦计算出输出并且知道 target_words，我们就可以直接计算损失。伪代码似乎增加了一层。另外，没有提到的softmax_w和softmax_b是什么。我想我可能因为提出这样一个简单的问题而错过了一些重要的事情。

请指出我正确的方向，任何建议都将受到高度赞赏。非常感谢。

score 3 · Accepted Answer

代码所做的只是在计算 softmax 之前添加一个额外的线性变换。softmax_w应该是一个tf.Variable包含权重矩阵。softmax_b应该是一个tf.Variable包含偏差向量。

查看本教程中的 softmax 示例以获取更多详细信息： https ://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html#softmax-regressions

tensorflow - 本文档中的 softmax_w 和 softmax_b 是什么？

1 回答 1

Related

Reference