我正在关注 Andre Ng 的自然语言处理课程,第 1 周,并试图找到计算梯度下降的函数的组件。
GradientDescent 函数是这样给出的:
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
'''
### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
# get 'm', the number of rows in matrix x
m = None
for i in range(0, num_iters):
# get z, the dot product of x and theta
z = None
# get the sigmoid of z
h = None
# calculate the cost function
J = None
# update the weights theta
theta = None
### END CODE HERE ###
J = float(J)
return J, theta
为了测试函数的准确性,提供了以下测试数据:
# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)
# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")
正确输入基础方程的所有分量后,预期输出如下:
The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]
Technically, the cost function is calculated by taking the dot product of the vectors 'y' and 'log(h)'. Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that matrix multiplication of a row vector with column vector performs the dot product.
=−1×(⋅()+(1−)⋅(1−))
在我的努力中,我能够推导出 99% 的方程,除了成本函数,它产生的值高于预期,因此,我当前的成本函数产生的值为 0.81721852,而预期值来自测试变量生成值 0.67094970。
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
'''
### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
# get 'm', the number of rows in matrix x
m = len(x)
xT = x.transpose()
for i in range(0, num_iters):
# get z, the dot product of x and theta
z = np.dot(x, theta)
# get the sigmoid of z
h = sigmoid(z)
# calculate the loss
# loss = (h - y)
# calculate the gradient
# gradient = np.dot(xT, loss)
# calculate the cost function
J = np.sum((np.log(h) - y) ** 2) / (2 * m)
#print("Iters %d | J: %f" % (i, J))
# update the weights theta
theta = theta - ( (alpha/m) * np.dot(xT, (h - y)) )
### END CODE HERE ###
J = float(J)
return J, theta
如何修改我的方程变量以得出正确的预期值 0.67094970,而不是我现在得到的值,即 0.81721852?