python - 在 Theano 中计算 Rnn 梯度的问题

Question

我正在使用 vanilla Rnn，使用梯度下降（非批量版本）进行训练，并且我在（标量）成本的梯度计算方面遇到了问题；这是我的代码的相关部分：

class Rnn(object):
# ............ [skipping the trivial initialization]
    def recurrence(x_t, h_tm_prev):
        h_t = T.tanh(T.dot(x_t, self.W_xh) +
                     T.dot(h_tm_prev, self.W_hh) + self.b_h)
        return h_t

    h, _ = theano.scan(
        recurrence,
        sequences=self.input,
        outputs_info=self.h0
    )

    y_t = T.dot(h[-1], self.W_hy) + self.b_y
    self.p_y_given_x = T.nnet.softmax(y_t)

    self.y_pred = T.argmax(self.p_y_given_x, axis=1)


def negative_log_likelihood(self, y):
    return -T.mean(T.log(self.p_y_given_x)[:, y])


def testRnn(dataset, vocabulary, learning_rate=0.01, n_epochs=50):
   # ............ [skipping the trivial initialization]
   index = T.lscalar('index')
   x = T.fmatrix('x')
   y = T.iscalar('y')
   rnn = Rnn(x, n_x=27, n_h=12, n_y=27)
   nll = rnn.negative_log_likelihood(y)
   cost = T.lscalar('cost')
   gparams = [T.grad(cost, param) for param in rnn.params]
   updates = [(param, param - learning_rate * gparam)
              for param, gparam in zip(rnn.params, gparams)
              ]
   train_model = theano.function(
       inputs=[index],
       outputs=nll,
       givens={
           x: train_set_x[index],
           y: train_set_y[index]
       },
   )
   sgd_step = theano.function(
       inputs=[cost],
       outputs=[],
       updates=updates
   )
   done_looping = False
   while(epoch < n_epochs) and (not done_looping):
       epoch += 1
       tr_cost = 0.
       for idx in xrange(n_train_examples):
           tr_cost += train_model(idx)
       # perform sgd step after going through the complete training set
       sgd_step(tr_cost)

由于某些原因，我不想将完整的（训练）数据传递给 train_model(..)，而是一次传递单个示例。现在的问题是每次调用 train_model(..) 都会返回该特定示例的成本（负对数似然），然后我必须汇总（完整（训练）数据集的所有成本），然后取导数并对 sgd_step(..) 中的权重参数执行相关更新，并且由于我当前的实现的明显原因，我收到此错误：theano.gradient.DisconnectedInputError: grad 方法被要求计算相对于 a 的梯度不属于成本计算图的变量，或者仅由不可微分运算符使用：W_xh。现在我不明白如何让 '成本' 计算图的一部分（在我的情况下，当我必须等待它被聚合时）或者有没有更好/优雅的方法来实现同样的事情？

谢谢。

score 0 · Accepted Answer

事实证明，如果符号变量不是计算图的一部分，则无法将它们带入 Theano 图。因此，我必须改变将数据传递给 train_model(..); 的方式。传递完整的训练数据而不是单个示例可以解决问题。

python - 在 Theano 中计算 Rnn 梯度的问题

1 回答 1

Related

Reference