我正在尝试使用梯度下降和两个数据集进行逻辑回归,每个数据集我得到不同的结果。
数据集 1 输入
X=
1 2 3
1 4 6
1 7 3
1 5 5
1 5 4
1 6 4
1 3 4
1 4 5
1 1 2
1 3 4
1 7 7
Y=
0
1
1
1
0
1
0
0
0
0
1
数据集2输入-
x =
1 20 30
1 40 60
1 70 30
1 50 50
1 50 40
1 60 40
1 30 40
1 40 50
1 10 20
1 30 40
1 70 70
y =
0
1
1
1
0
1
0
0
0
0
1
数据集 1 和数据集 2 的区别只是值的范围。当我为这两个数据集运行我的通用代码时,我的代码为数据集 1 提供了所需的输出,但为数据集 2 提供了非常奇怪的想法。
我的代码如下:
[m,n]=size(x);
x=[ones(m,1),x];
X=x;
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+');
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
hold off
% Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
% The critical thing hear to know is that this is not a linear regression but logistic regression, hence the h(hypothesis varies)
% So we calculate the hypothesis that is based on e
% j_theta will be calculated upon all the training set for 1st iteration
%
g=inline('1.0 ./ (1.0 + exp(-z))');
alpha=1;
theta = zeros(size(x(1,:)))'; % the theta has to be a 3*1 matrix so that it can multiply by x that is m*3 matrix
max_iter=2000;
j_theta=zeros(max_iter,1); % j is a zero matrix that is used to store the theta cost function j(theta)
for num_iter=1:max_iter
% Now we calculate the hx or hypothetis, It is calculated here inside no. of iteration because the hupothesis has to be calculated for new theta for every iteration
z=x*theta;
h=g(z); % Here the effect of inline function we used earlier will reflect
h
j_theta(num_iter)=(1/m)*(-y'* log(h) - (1 - y)'*log(1-h)) ; % This formula is the vectorized form of the cost function J(theta) This calculates the cost function
j_theta
theta = theta - (alpha/m) * x' * (1./(1+exp(-x*theta)) - y);
%grad=(1/m) * x' * (h-y); % This formula is the gradient descent formula that calculates the theta value.
%theta=theta - alpha .* grad; % Actual Calculation for theta
theta
end
figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off
figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+');
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
plot_x = [min(X(:,2))-2, max(X(:,2))+2]; % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off
请找到每个数据集的图表,如下所示:
对于数据集 1:
对于数据集2:
如您所见,数据集一给了我正确的答案。
话虽如此,我相信 datsaet2 的数据范围很广,可能是 10-100,因此为了对其进行规范化,我对 dataset2 使用了特征缩放并得到了图表。形成的决策线是正确的,但略低于预期的位置,请自己查看。
具有特征缩放的 Dataset2 输入:
x =
1.00000 -1.16311 -0.89589
1.00000 -0.13957 1.21585
1.00000 1.39573 -0.89589
1.00000 0.37219 0.51194
1.00000 0.37219 -0.19198
1.00000 0.88396 -0.19198
1.00000 -0.65134 -0.19198
1.00000 -0.13957 0.51194
1.00000 -1.67487 -1.59981
1.00000 -0.65134 -0.19198
1.00000 1.39573 1.91977
y =
0
1
1
1
0
1
0
0
0
0
1
下面给出了在我之前的代码中添加特征缩放后得到的图表
正如你所看到的,如果决策线有点高,那么我就会得到完美的输出..
请帮助我理解这个场景,为什么即使特征缩放也无济于事。或者如果我的代码有错误,或者我遗漏了什么。