machine-learning - 具有梯度下降的逻辑回归导致不同数据集的不同结果

Question

我正在尝试使用梯度下降和两个数据集进行逻辑回归，每个数据集我得到不同的结果。

数据集 1 输入

数据集2输入-

x =

1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y =

数据集 1 和数据集 2 的区别只是值的范围。当我为这两个数据集运行我的通用代码时，我的代码为数据集 1 提供了所需的输出，但为数据集 2 提供了非常奇怪的想法。

我的代码如下：

[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
hold off

%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.

%  The critical thing hear to know is that this is not a linear regression but logistic regression, hence the h(hypothesis varies)
%  So we calculate the hypothesis that is based on e

% j_theta will be calculated upon all the training set for 1st iteration
% 


    g=inline('1.0 ./ (1.0 + exp(-z))');
    alpha=1;
    theta = zeros(size(x(1,:)))';   % the theta has to be a 3*1 matrix so that it can multiply by x that is m*3 matrix
    max_iter=2000;
    j_theta=zeros(max_iter,1);            % j is a zero matrix that is used to store the theta cost function j(theta)

    for num_iter=1:max_iter
    %  Now we calculate the hx or hypothetis, It is calculated here inside no. of iteration because the hupothesis has to be calculated for new theta for every iteration
         z=x*theta;
         h=g(z);     % Here the effect of inline function we used earlier will reflect
         h

           j_theta(num_iter)=(1/m)*(-y'* log(h) - (1 - y)'*log(1-h)) ;    % This formula is the vectorized form of the cost function J(theta) This calculates the cost function      
         j_theta
         theta = theta - (alpha/m) * x' * (1./(1+exp(-x*theta)) - y);
         %grad=(1/m) *  x' * (h-y);     % This formula is the gradient descent formula that calculates the theta value.  
         %theta=theta - alpha .* grad;          % Actual Calculation for theta
           theta
    end

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(X(:,2))-2,  max(X(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

请找到每个数据集的图表，如下所示：

对于数据集 1：

在此处输入图像描述

对于数据集2：

在此处输入图像描述

如您所见，数据集一给了我正确的答案。

话虽如此，我相信 datsaet2 的数据范围很广，可能是 10-100，因此为了对其进行规范化，我对 dataset2 使用了特征缩放并得到了图表。形成的决策线是正确的，但略低于预期的位置，请自己查看。

具有特征缩放的 Dataset2 输入：

x =

1.00000  -1.16311  -0.89589
1.00000  -0.13957   1.21585
1.00000   1.39573  -0.89589
1.00000   0.37219   0.51194
1.00000   0.37219  -0.19198
1.00000   0.88396  -0.19198
1.00000  -0.65134  -0.19198
1.00000  -0.13957   0.51194
1.00000  -1.67487  -1.59981
1.00000  -0.65134  -0.19198
1.00000   1.39573   1.91977

y =

0
1
1
1
0
1
0
0
0
0
1

下面给出了在我之前的代码中添加特征缩放后得到的图表

在此处输入图像描述

正如你所看到的，如果决策线有点高，那么我就会得到完美的输出..

请帮助我理解这个场景，为什么即使特征缩放也无济于事。或者如果我的代码有错误，或者我遗漏了什么。

machine-learning - 具有梯度下降的逻辑回归导致不同数据集的不同结果

0 回答 0

Related

Reference