45

I've actually been struggling against this for like 2 months now. What is it that makes these different?

hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp

hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp



theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);

The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .

4

7 回答 7

78

What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.

   temp=X(:,2) * temp

The last example will work but can be vectorized even more to be more simple and efficient.

I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.

You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.

The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.

X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)]; 
theta = zeros(2, 1);        

iterations = 2000;
alpha = 0.001;

for iter = 1:iterations
     theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end
于 2014-03-19T19:01:34.327 回答
9

在第一个中,如果 X 是一个 3x2 矩阵并且 theta 是一个 2x1 矩阵,那么“假设”将是一个 3x1 矩阵。

假设 y 是一个 3x1 矩阵,那么您可以执行 (hypotheses - y) 并得到一个 3x1 矩阵,那么该 3x1 的转置是一个分配给 temp 的 1x3 矩阵。

然后将 1x3 矩阵设置为 theta(2),但这不应该是矩阵。

代码的最后两行有效,因为使用我上面的 mxn 示例,

(X * theta)

将是一个 3x1 矩阵。

然后将该 3x1 矩阵减去 y(一个 3x1 矩阵),结果是一个 3x1 矩阵。

(X * theta) - y

所以3x1矩阵的转置是1x3矩阵。

((X * theta) - y)'

最后,一个 1x3 矩阵乘以一个 3x1 矩阵将等于一个标量或 1x1 矩阵,这就是您要寻找的。我相信你已经知道了,但为了彻底,X(:,2) 是 3x2 矩阵的第二列,使其成为 3x1 矩阵。

于 2012-05-16T22:47:40.597 回答
9
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Performs gradient descent to learn theta. Updates theta by taking num_iters 
% gradient steps with learning rate alpha.

% Number of training examples
m = length(y); 
% Save the cost J in every iteration in order to plot J vs. num_iters and check for convergence 
J_history = zeros(num_iters, 1);

for iter = 1:num_iters
    h = X * theta;
    stderr = h - y;
    theta = theta - (alpha/m) * (stderr' * X)';
    J_history(iter) = computeCost(X, y, theta);
end

end
于 2017-10-18T21:45:37.553 回答
4

当您更新时,您需要这样做

Start Loop {

temp0 = theta0 - (equation_here);

temp1 = theta1 - (equation_here);


theta0 =  temp0;

theta1 =  temp1;

} End loop
于 2013-10-22T19:21:48.507 回答
3

这可以更简单地用

h = X * theta   % m-dimensional matrix (prediction our hypothesis gives per training example)
std_err = h - y  % an m-dimensional matrix of errors (one per training example)
theta = theta - (alpha/m) * X' * std_err

请记住X, 是设计矩阵,因此 的每一行X代表一个训练示例,而 的每一列X代表所有训练示例中的给定组件(例如第零个或第一个组件)。因此, 的每一列X正是我们想要将元素与std_err之前求和的东西相乘以获得 theta 向量的相应分量。

于 2018-11-01T12:13:33.323 回答
-3
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1 : num_iters
    hypothesis = X * theta;
    Error = (hypothesis - y);
    temp = theta - ((alpha / m) * (Error' * X)');
    theta = temp;
    J_history(iter) = computeCost(X, y, theta);
end
end
于 2019-03-19T19:59:12.117 回答
-9
.
.
.
.
.
.
.
.
.
Spoiler alert












m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta. 
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
% ========================== BEGIN ===========================


t = zeros(2,1);
J = computeCost(X, y, theta);
t = theta - ((alpha*((theta'*X') - y'))*X/m)';
theta = t;
J1 = computeCost(X, y, theta);

if(J1>J),
    break,fprintf('Wrong alpha');
else if(J1==J)
    break;
end;


% ========================== END ==============================

% Save the cost J in every iteration    
J_history(iter) = sum(computeCost(X, y, theta));
end
end
于 2014-06-05T15:55:36.447 回答