python - 在生成梯度下降函数和成本方面需要帮助

Question

我正在关注 Andre Ng 的自然语言处理课程，第 1 周，并试图找到计算梯度下降的函数的组件。

GradientDescent 函数是这样给出的：

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = None
    
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = None
        
        # get the sigmoid of z
        h = None
        
        # calculate the cost function
        J = None

        # update the weights theta
        theta = None
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

为了测试函数的准确性，提供了以下测试数据：

# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)

# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")

正确输入基础方程的所有分量后，预期输出如下：

The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]

Technically, the cost function    is calculated by taking the dot product of the vectors 'y' and 'log(h)'. Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that matrix multiplication of a row vector with column vector performs the dot product.
                   =−1×(⋅()+(1−)⋅(1−))

在我的努力中，我能够推导出 99% 的方程，除了成本函数，它产生的值高于预期，因此，我当前的成本函数产生的值为 0.81721852，而预期值来自测试变量生成值 0.67094970。

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = len(x)
    xT = x.transpose()
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x, theta)
        
        # get the sigmoid of z
        h = sigmoid(z)
        
        # calculate the loss
        # loss = (h - y)
        
        # calculate the gradient
        # gradient = np.dot(xT, loss)
        
        # calculate the cost function
        J = np.sum((np.log(h) - y) ** 2) / (2 * m)     
        #print("Iters %d | J: %f" % (i, J))
    
        # update the weights theta
        theta = theta - ( (alpha/m) *  np.dot(xT, (h - y)) )
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

如何修改我的方程变量以得出正确的预期值 0.67094970，而不是我现在得到的值，即 0.81721852？

score 2 · Accepted Answer

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = len(x)
    xT = x.transpose()
    yT = y.transpose()
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x, theta)
        
        # get the sigmoid of z
        h = sigmoid(z)        
        
        # calculate the cost function
        J = - (1/m) * (yT.dot(np.log(h)) + (1-yT).dot(np.log(1-h)))
    
        # update the weights theta
        theta = theta - ((alpha/m) *  xT.dot(h - y))
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

score 0 · Accepted Answer

# UNQ_C2 GRADED FUNCTION: gradientDescent
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE ###
    # get 'm', the number of rows in matrix x
    m = x.shape[0]

    
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = x.dot(theta)
        
        # get the sigmoid of z
        sigmoid_v = np.vectorize(sigmoid)
        h = sigmoid_v(z)
        
        # calculate the cost function
        J = (-1/m)*( np.dot(y.T,np.log(h)) +  np.dot((1-y).T , np.log(1-h)))

        # update the weights theta
        theta = theta - ((alpha/m)*(np.dot(x.T, h-y)))
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

python - 在生成梯度下降函数和成本方面需要帮助

2 回答 2

Related

Reference