2

我正在尝试使用梯度下降实现逻辑回归,

我得到了迭代次数的成本函数 j_theta,幸运的是,当根据迭代次数绘制 j_theta 时,我的 j_theta 正在减少。

我使用的数据集如下:

x=
1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y=   0
     1
     1
     1
     0
     1
     0
     0
     0
     0
     1

我设法使用梯度下降为逻辑回归编写的代码是:

%1. The below code would load the data present in your desktop to the octave memory 
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);


%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they     should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);

% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather

max_iter=50;

theta = zeros(size(x(1,:)))'; 
j_theta=zeros(max_iter,1);         

for num_iter=1:max_iter
  % We calculate the cost Function
  j_cost_each=0;
  alpha=1;
  theta
    for i=1:m
        z=0;
        for j=1:n+1
%            theta(j)
            z=z+(theta(j)*x(i,j));  
            z
        end
        h= 1.0 ./(1.0 + exp(-z));
        j_cost_each=j_cost_each + ( (-y(i) * log(h)) -  ((1-y(i)) * log(1-h)) );  
%       j_cost_each
    end  
    j_theta(num_iter)=(1/m) * j_cost_each;

    for j=1:n+1
        grad(j) = 0;
        for i=1:m
            z=(x(i,:)*theta);  
            z            
            h=1.0 ./ (1.0 + exp(-z));
            h
            grad(j) += (h-y(i)) * x(i,j); 
        end
        grad(j)=grad(j)/m;
        grad(j)
        theta(j)=theta(j)- alpha * grad(j);
    end
end      

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class     that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all     class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+'); 
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(x(:,2))-2,  max(x(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%

如果您查看 x1 与 x2 的图表,图表将如下所示,

在此处输入图像描述

运行代码后,我创建了一个决策边界。决策线的形状似乎还可以,但有点偏移。带有决策边界的 x1 与 x2 的关系图如下所示:

![在此处输入图像描述][2]

请告诉我我哪里错了......

谢谢:)

新图表::::

![enter image description here][1]


If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X. 
4

1 回答 1

4

问题在于您的成本函数计算和/或梯度计算,您的绘图功能很好。我在我为逻辑回归实现的算法上运行了您的数据集,但使用了矢量化技术,因为在我看来它更容易调试。我得到的 theta 的最终值是

theta = [-76.4242, 0.8214, 0.7948] 我也使用了alpha = 0.3

我绘制了决策边界,它看起来不错,我建议使用矢量化形式,因为我认为它更容易实现和调试。

决策边界

我也认为您对梯度下降的实现并不完全正确。50 次迭代还不够,最后一次迭代的成本也不够好。也许您应该尝试在停止条件下运行它以进行更多迭代。另请查看此讲座以了解优化技术。 https://class.coursera.org/ml-006/lecture/37

于 2014-07-22T16:13:37.390 回答