python - 为什么注意力解码器的输出需要与注意力相结合

Question

x = linear([inp] + attns, input_size, True)
# Run the RNN.
cell_output, state = cell(x, state)
# Run the attention mechanism.
if i == 0 and initial_state_attention:
  with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
    attns = attention(state)
else:
  attns = attention(state)
with variable_scope.variable_scope("AttnOutputProjection"):
  output = linear([cell_output] + attns, output_size, True)

我的问题是，为什么我们需要将 cell_output 与 attns 结合，而不是仅仅使用 cell_output 作为输出？

谢谢

score -1 · Accepted Answer

注意机制需要将更多的注意力放在一些特殊或特定的节点上。

在这里，您的 cell_output 是数学中的矩阵。以及深度学习中节点的表示或组合。

所以最后，如果您想为某些数据赋予更多优先级，那么您必须对 cell_output 进行一些更改。这就是我们通过对原始矩阵（cell_output）进行一些连接或加法或点积运算来实现的。

let x = 5
and you want to make x = 7
then you can do x = x + 2(one way).
so that means you have make changes to your x variable. 
same operation you are doing to apply attention to your hidden layers nodes or in your case cell_output. 
here x is cell_output and 2 is attention output.

如果你不对你的 cell_output 做任何改变，那么你怎么会注意你的输出表示！！。

您可以直接将 cell_output 传递到最后一层，而无需结合注意力矩阵或不应用注意力。但是你需要知道为什么神经网络需要注意机制！

python - 为什么注意力解码器的输出需要与注意力相结合

1 回答 1

Related

Reference