我正在尝试修改http://cs231n.github.io/neural-networks-case-study/#together中的示例,为数字目标变量制作神经网络,因此它将是一个具有回归的神经网络。我肯定在推导部分做错了,因为我的损失函数在疯狂增长。这是代码:
h = neurons # size of hidden layer
D = X[0].size
K = 1
W = 0.01 * np.random.randn(D,h)
b = np.zeros((1,h))
W2 = 0.01 * np.random.randn(h,K)
b2 = np.zeros((1,K))
# some hyperparameters
step_size = 1 #learning rate
reg = 0.001 # regularization strength
loss_vec = []
# gradient descent loop
num_examples = X.shape[0]
for i in xrange(1000):
# evaluate class scores, [N x K]
hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation
scores = np.dot(hidden_layer, W2) + b2
loss = np.power(y - scores,2)
#if i % 50 == 0:
loss_vec.append(np.mean(np.abs(loss)))
print "iteration %d: loss %f" % (i, np.mean(np.abs(loss)))
# compute the gradient on scores
dscores = 2*(y-scores) # here I am not sure is correct
# backpropate the gradient to the parameters
# first backprop into parameters W2 and b2
dW2 = np.dot(hidden_layer.T, dscores)
db2 = np.sum(dscores, axis=0, keepdims=True)
# next backprop into hidden layer
dhidden = np.dot(dscores, W2.T)
# backprop the ReLU non-linearity
dhidden[hidden_layer <= 0] = 0
# finally into W,b
dW = np.dot(X.T, dhidden)
db = np.sum(dhidden, axis=0, keepdims=True)
# add regularization gradient contribution
dW2 += reg * W2
dW += reg * W
# perform a parameter update
W += -step_size * dW
b += -step_size * db
W2 += -step_size * dW2
b2 += -step_size * db2
代码输出:
迭代0:损失5786.021888
迭代1:损失24248543152533318464172949461134213120.000000
迭代2:损失388137710832824223006297769344993376570435619092