3

我将首先总结一下我cuDNN 5.1 rnn 函数的理解:

张量维度

x = [seq_length, batch_size, vocab_size] # input
y = [seq_length, batch_size, hiddenSize] # output

dx = [seq_length, batch_size, vocab_size] # input gradient
dy = [seq_length, batch_size, hiddenSize] # output gradient

hx = [num_layer, batch_size, hiddenSize] # input hidden state
hy = [num_layer, batch_size, hiddenSize] # output hidden state
cx = [num_layer, batch_size, hiddenSize] # input cell state
cy = [num_layer, batch_size, hiddenSize] # output cell state

dhx = [num_layer, batch_size, hiddenSize] # input hidden state gradient
dhy = [num_layer, batch_size, hiddenSize] # output hidden state gradient
dcx = [num_layer, batch_size, hiddenSize] # input cell state gradient
dcy = [num_layer, batch_size, hiddenSize] # output cell state gradient

w = [param size] # parameters (weights & bias)
dw = [param size] # parameters gradients

cudnnRNNForwardTraining / cudnnRNNForwardInference

input: x, hx, cx, w
output: y, hy, cy

cudnnRNNBackwardData

input: y, dy, dhy, dcy, w, hx, cx
output: dx, dhx, dcx

cudnnRNNBackwardWeights

input: x, hx, y, dw
output: dw

问题:

  1. 以下多层 RNN ( num_layer > 1) 的训练工作流程是否正确?
  1. 初始化 hx,cx,dhy,dcy 为 NULL
  2. init w:(权重:小随机值,偏差:1)
  3. 向前
  4. 后向数据
  5. 后向权重
  6. 更新权重:w += dw
  7. dw = 0
  8. 转到 3。
  1. 当num_layer > 1时,您是否确认 cuDNN 已经实现了堆叠 rnn ?(无需调用num_layer次前进/后退方法)
  2. 我应该在下一批中将隐藏状态和单元状态重新注入网络吗?
  3. lstm/gru 公式的输出是hy。我应该使用hy作为输出还是y

此处发布相同的问题(我将同步答案)

4

0 回答 0