我正在尝试从对数双线性模型的代码中实现python中神经概率语言模型的最大似然学习: https ://github.com/wenjieguan/Log-bilinear-language-models/blob/master/lbl.py
我在 theano 中使用 grad 函数来计算梯度,并尝试使用函数 train 来更新模型的参数,但它出错了。这是我的代码:
def train(self, sentences, alpha = 0.001, batches = 1000):
print('Start training...')
self.alpha = alpha
count = 0
RARE = self.vocab['<>']
#print RARE
q = np.zeros(self.dim, np.float32)
#print q
delta_context = [np.zeros((self.dim, self.dim), np.float32) for i in range(self.context) ]
#print delta_context
delta_feature = np.zeros((len(self.vocab), self.dim), np.float32)
#print delta_feature
for sentence in sentences:
sentence = self.start_sen + sentence + self.end_sen
for pos in range(self.context, len(sentence) ):
count += 1
featureW = []
contextMatrix = []
indices = []
for i, r in enumerate(sentence[pos - self.context : pos]):
if r == '<_>':
index = self.vocab.get(r, RARE)
print index
ri = self.featureVectors[index]
#print ri
ci = self.contextMatrix[i]
#print ci
#Caculating predicted representation for the target word
q += np.dot(ci, ri)
#Computing energy function
energy = np.exp(np.dot(self.featureVectors, q) + self.biases)
#print energy
#Computing the conditional distribution
probs = energy / np.sum(energy)
#print probs
w_index = self.vocab.get(sentence[pos], RARE)
#Computing gradient
logProbs = T.log(probs[w_index])
print 'Gradient start...'
delta_context, delta_feature = T.grad(logProbs, [self.contextMatrix, self.featureVectors])
print 'Gradient completed!'
train = theano.function(
inputs = [self.vocab],
outputs = [logProbs],
updates=((self.featureVectors,self.featureVectors - self.alpha * delta_feature),
(self.contextMatrix,self.contextMatrix - self.alpha * delta_context)),
print('Training is finished!')
我刚刚学习了 Python 和神经概率语言模型,所以对我来说很难。请问你能帮帮我吗!谢谢!