r - 将决策边界拟合到 R 中的逻辑回归模型

Question

我正在努力使用 ggplot 在 R 中绘制决策边界。

我有 2 个变量（考试分数）和一个学生是否被录取的二元分类。数据如下所示：

> 头（考试数据）
  Exam1Score Exam2Score 录取
1 34.62366 78.02469 0
2 30.28671 43.89500 0
3 35.84741 72.90220 0
4 60.18260 86.30855 1
5 79.03274 75.34438 1
6 45.08328 56.31637 0

我可以使用 ggplot 绘制数据：

exam.plot <- ggplot(data=exam.data, aes(x=Exam1Score, y=Exam2Score, col = ifelse(Admitted == 1,'dark green','red'), size=0.5))+
  geom_point()+
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  theme(legend.position="none")

然后成功拟合逻辑回归模型：

exam.lm <- glm(data=exam.data, formula=Admitted ~ Exam1Score + Exam2Score, family="binomial")

因此，在网上搜索了很多之后，我决定手动调整决策边界（虽然确实尝试了一段时间使用 stat_smooth 但无法让它工作），我尝试了以下方法：

# Fit the decision boundary
plot_x <- c(min(exam.data$Exam1Score)-2, max(exam.data$Exam1Score)+2)
plot_y <- (-1 /coef(exam.lm)[3]) * (coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])
db.data <- data.frame(rbind(plot_x, plot_y))
colnames(db.data) <- c('x','y')

# Add the decision boundary plot
ggplot()+geom_line(data=db.data, aes(x=x, y=y))

它成功绘制了决策边界，但我无法将其添加到我现有的绘图中：

> exam.plot+geom_line(data=db.data, aes(x=x, y=y))
Error: Aesthetics must either be length one, or the same length as the dataProblems:x, y

有人能指出我做错了什么，或者我是否真的可以用 +stat_smooth() 做到这一点？

所有代码（ex2.R）和文件都在这里：https ://github.com/StuHorsman/rscripts/tree/master/R/Coursera

谢谢！

斯图尔特

更新：我可以实现一些类似的：

plot(exam.data$Exam1Score, exam.data$Exam2Score, type="n", xlab="Exam 1 Scores", ylab="Exam 2 Scores")      
points(exam.data$Exam1Score[exam.data$Admitted==1], exam.data$Exam2Score[exam.data$Admitted==1], pch=4, col="green")  
points(exam.data$Exam1Score[exam.data$Admitted==0], exam.data$Exam2Score[exam.data$Admitted==0], pch=4, col="red")        
lines(db.data, col="blue")

score 2 · Accepted Answer

问题是exam.plot你不仅使用美学xand y，而且还使用coland size（后者不必要地）。这些层需要具有在调用中定义的所有美学设置。ggplot ()（我经常被这个问题所困扰）。

因此：

exam.plot+geom_line(data=db.data, aes(x=x, y=y), col = "black", size = 1)

确实情节。

但是，我建议exam.plot进行一些更改并删除所有不适用于所有图层的美学（并将它们放入图层定义中）：

exam.plot <- ggplot(data=exam.data, aes(x = Exam1Score, y=Exam2Score))+
  geom_point(aes (col = Admitted), size = 0.5)+
  scale_color_manual (values =  c('red', 'dark green')) + 
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  coord_equal () +  # assuming that the scores have the same scale.
  theme(legend.position="none")

exam.plot + geom_line(data=db.data, aes(x=x, y=y))

其中有示例数据

exam.data <- data.frame (Exam1Score = rnorm (100) + 0:1, 
                         Exam2Score = rnorm (100) + 0:1, 
                         Admitted = factor (rep (0:1, 50)))

产量：
示例图

（使用默认大小绘制，对于本示例，0.5 几乎不可见）

score 0 · Accepted Answer

为什么不是 stat_function？

g=ggplot(exam.data,aes(x=Exam1score,y=Exam2score,col=factor(Admitted)))
g=g+geom_point(size=2.2)+scale_color_discrete(name="Administered")
g=g+stat_function(fun=function(x){(-Intercept-Beta1*x)/Beta2},xlim=c(0,100))
g

Intercept,beta1,beta2 是逻辑回归函数的参数。

r - 将决策边界拟合到 R 中的逻辑回归模型

2 回答 2

Related

Reference