我已经实现了一个神经网络(深度自动编码器),我正在尝试在其上执行反向传播。该网络由 sigmoid 激活函数和输出层的 softmax 激活函数组成。为了计算误差,我使用了交叉熵误差函数。数据输入是词矩阵袋,其中词除以文档的长度以对数据进行归一化。
我正在使用 Conjugate Gradient 方法来找到局部最小值。我的问题基本上是在反向传播期间错误正在上升。我相信这与我计算梯度错误有关?
计算误差和梯度的代码如下:
def get_grad_and_error(self,weights,weight_sizes,x):
weights = self.__convert__(weights, weight_sizes)
x = append(x,ones((len(x),1),dtype = float64),axis = 1)
xout, z_values = self.__generate_output_data__(x, weights)
f = -sum(x[:,:-1]*log(xout)) # Cross-entropy error function
# Gradient
number_of_weights = len(weights)
gradients = []
delta_k = None
for i in range(len(weights)-1,-1,-1):
if i == number_of_weights-1:
delta = (xout-x[:,:-1])
grad = dot(z_values[i-1].T,delta)
elif i == 0:
delta = dot(delta_k,weights[i+1].T)*z_values[i]*(1-z_values[i])
delta = delta[:,:-1]
grad = dot(x.T,delta)
else:
delta = dot(delta_k,weights[i+1].T)*z_values[i]*(1-z_values[i])
delta = delta[:,:-1]
grad = dot(z_values[i-1].T,delta)
delta_k = delta
gradients.append(grad)
gradients.reverse()
gradients_formatted = []
for g in gradients:
gradients_formatted = append(gradients_formatted,reshape(g,(1,len(g)*len(g[0])))[0])
return f,gradients_formatted
要计算网络的输出,我使用以下方法:
def __generate_output_data__(self, x, weight_matrices_added_biases):
z_values = []
for i in range(len(weight_matrices_added_biases)-1):
if i == 0:
z = dbn.sigmoid(dot(x,weight_matrices_added_biases[i]))
else:
z = dbn.sigmoid(dot(z_values[i-1],weight_matrices_added_biases[i]))
z = append(z,ones((len(x),1),dtype = float64),axis = 1)
z_values.append(z)
xout = dbn.softmax(dot(z_values[-1],weight_matrices_added_biases[-1]))
return xout, z_values
我计算 sigmoid 和 softmax 值如下:
def sigmoid(x):
return 1./(1+exp(-x))
def softmax(x):
numerator = exp(x)
denominator = numerator.sum(axis = 1)
denominator = denominator.reshape((x.shape[0],1))
softmax = numerator/denominator
return softmax
如果有人可以提供帮助,我将不胜感激?如果您需要我详细说明上述任何信息,请告诉我?谢谢。