r - 在图上添加回归线方程和 R^2

Question

我想知道如何在ggplot. 我的代码是：

library(ggplot2)

df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
p <- ggplot(data = df, aes(x = x, y = y)) +
            geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
            geom_point()
p

任何帮助将不胜感激。

score 280 · Accepted Answer

这是一种解决方案

# GET EQUATION AND R-SQUARED AS STRING
# SOURCE: https://groups.google.com/forum/#!topic/ggplot2/1TgH-kG5XMA

lm_eqn <- function(df){
    m <- lm(y ~ x, df);
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
         list(a = format(unname(coef(m)[1]), digits = 2),
              b = format(unname(coef(m)[2]), digits = 2),
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}

p1 <- p + geom_text(x = 25, y = 300, label = lm_eqn(df), parse = TRUE)

编辑。我找出了我选择此代码的来源。这是ggplot2 谷歌组中原始帖子的链接

score 189 · Accepted Answer

stat_poly_eq()我在我的包中包含了一个统计数据ggpmisc，允许这个答案：

library(ggplot2)
library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
my.formula <- y ~ x
p <- ggplot(data = df, aes(x = x, y = y)) +
   geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
   stat_poly_eq(formula = my.formula, 
                aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
                parse = TRUE) +         
   geom_point()
p

该统计数据适用于没有缺失项的任何多项式，并且希望具有足够的灵活性以普遍有用。R^2 或调整后的 R^2 标签可以与任何带有 lm() 的模型公式一起使用。作为一个 ggplot 统计数据，它在组和方面的表现都符合预期。

'ggpmisc' 包可通过 CRAN 获得。

0.2.6 版刚刚被 CRAN 接受。

它解决了@shabbychef 和@MYaseen208 的评论。

@MYaseen208 这显示了如何添加帽子。

library(ggplot2)
library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
my.formula <- y ~ x
p <- ggplot(data = df, aes(x = x, y = y)) +
   geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
   stat_poly_eq(formula = my.formula,
                eq.with.lhs = "italic(hat(y))~`=`~",
                aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
                parse = TRUE) +         
   geom_point()
p

@shabbychef 现在可以将方程中的变量与用于轴标签的变量匹配。要将x替换为z并将y替换为h，可以使用：

p <- ggplot(data = df, aes(x = x, y = y)) +
   geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
   stat_poly_eq(formula = my.formula,
                eq.with.lhs = "italic(h)~`=`~",
                eq.x.rhs = "~italic(z)",
                aes(label = ..eq.label..), 
                parse = TRUE) + 
   labs(x = expression(italic(z)), y = expression(italic(h))) +          
   geom_point()
p

作为这些正常的 R 解析表达式，希腊字母现在也可以在等式的 lhs 和 rhs 中使用。

[2017-03-08] @elarry 编辑以更准确地解决原始问题，展示如何在方程式和 R2 标签之间添加逗号。

p <- ggplot(data = df, aes(x = x, y = y)) +
  geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
  stat_poly_eq(formula = my.formula,
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., ..rr.label.., sep = "*plain(\",\")~")), 
               parse = TRUE) +         
  geom_point()
p

[2019-10-20] @helen.h 我在下面给出了使用stat_poly_eq()with 分组的示例。

library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))
my.formula <- y ~ x
p <- ggplot(data = df, aes(x = x, y = y, colour = group)) +
  geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point()
p

p <- ggplot(data = df, aes(x = x, y = y, linetype = group)) +
  geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point()
p

[2020-01-21] @Herman 乍一看可能有点反直觉，但是在使用分组时要获得单个方程需要遵循图形的语法。要么将创建分组的映射限制为单个图层（如下所示），要么保留默认映射并在您不希望分组的图层中使用常量值覆盖它（例如colour = "black"）。

继续前面的例子。

p <- ggplot(data = df, aes(x = x, y = y)) +
  geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point(aes(colour = group))
p

[2020-01-22] 为了完整起见，一个带有分面的示例，证明在这种情况下也满足了图形语法的期望。

library(ggpmisc)
df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))
my.formula <- y ~ x

p <- ggplot(data = df, aes(x = x, y = y)) +
  geom_smooth(method = "lm", se=FALSE, formula = my.formula) +
  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point() +
  facet_wrap(~group)
p

score 104 · Accepted Answer

我更改了几行源代码stat_smooth和相关函数，以创建一个添加拟合方程和 R 平方值的新函数。这也适用于构面图！

library(devtools)
source_gist("524eade46135f6348140")
df = data.frame(x = c(1:100))
df$y = 2 + 5 * df$x + rnorm(100, sd = 40)
df$class = rep(1:2,50)
ggplot(data = df, aes(x = x, y = y, label=y)) +
  stat_smooth_func(geom="text",method="lm",hjust=0,parse=TRUE) +
  geom_smooth(method="lm",se=FALSE) +
  geom_point() + facet_wrap(~class)

在此处输入图像描述

我使用@Ramnath 的答案中的代码来格式化等式。该stat_smooth_func功能不是很强大，但使用它应该不难。

https://gist.github.com/kdauria/524eade46135f6348140。ggplot2如果出现错误，请尝试更新。

score 79 · Accepted Answer

我已将 Ramnath 的帖子修改为 a) 使其更通用，因此它接受线性模型作为参数而不是数据框，并且 b) 更恰当地显示底片。

lm_eqn = function(m) {

  l <- list(a = format(coef(m)[1], digits = 2),
      b = format(abs(coef(m)[2]), digits = 2),
      r2 = format(summary(m)$r.squared, digits = 3));

  if (coef(m)[2] >= 0)  {
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
  } else {
    eq <- substitute(italic(y) == a - b %.% italic(x)*","~~italic(r)^2~"="~r2,l)    
  }

  as.character(as.expression(eq));                 
}

用法将更改为：

p1 = p + geom_text(aes(x = 25, y = 300, label = lm_eqn(lm(y ~ x, df))), parse = TRUE)

score 23 · Accepted Answer

这里给大家最简单的代码

注意：显示 Pearson 的 Rho 而不是R^2。

library(ggplot2)
library(ggpubr)

df <- data.frame(x = c(1:100)
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
p <- ggplot(data = df, aes(x = x, y = y)) +
        geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
        geom_point()+
        stat_cor(label.y = 35)+ #this means at 35th unit in the y axis, the r squared and p value will be shown
        stat_regline_equation(label.y = 30) #this means at 30th unit regresion line equation will be shown

p

score 18 · Accepted Answer

使用ggpubr：

library(ggpubr)

# reproducible data
set.seed(1)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

# By default showing Pearson R
ggscatter(df, x = "x", y = "y", add = "reg.line") +
  stat_cor(label.y = 300) +
  stat_regline_equation(label.y = 280)

# Use R2 instead of R
ggscatter(df, x = "x", y = "y", add = "reg.line") +
  stat_cor(label.y = 300, 
           aes(label = paste(..rr.label.., ..p.label.., sep = "~`,`~"))) +
  stat_regline_equation(label.y = 280)

## compare R2 with accepted answer
# m <- lm(y ~ x, df)
# round(summary(m)$r.squared, 2)
# [1] 0.85

score 16 · Accepted Answer

真的很喜欢@Ramnath 解决方案。为了允许使用自定义回归公式（而不是固定为 y 和 x 作为文字变量名称），并将 p 值添加到打印输出中（正如@Jerry T 评论的那样），这里是 mod：

lm_eqn <- function(df, y, x){
    formula = as.formula(sprintf('%s ~ %s', y, x))
    m <- lm(formula, data=df);
    # formating the values into a summary string to print out
    # ~ give some space, but equal size and comma need to be quoted
    eq <- substitute(italic(target) == a + b %.% italic(input)*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue), 
         list(target = y,
              input = x,
              a = format(as.vector(coef(m)[1]), digits = 2), 
              b = format(as.vector(coef(m)[2]), digits = 2), 
             r2 = format(summary(m)$r.squared, digits = 3),
             # getting the pvalue is painful
             pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=1)
            )
          )
    as.character(as.expression(eq));                 
}

geom_point() +
  ggrepel::geom_text_repel(label=rownames(mtcars)) +
  geom_text(x=3,y=300,label=lm_eqn(mtcars, 'hp','wt'),color='red',parse=T) +
  geom_smooth(method='lm')

不幸的是，这不适用于 facet_wrap 或 facet_grid。

score 4 · Accepted Answer

受此答案中提供的方程式风格的启发，一种更通用的方法（多个预测变量 + 乳胶输出作为选项）可以是：

print_equation= function(model, latex= FALSE, ...){
    dots <- list(...)
    cc= model$coefficients
    var_sign= as.character(sign(cc[-1]))%>%gsub("1","",.)%>%gsub("-"," - ",.)
    var_sign[var_sign==""]= ' + '

    f_args_abs= f_args= dots
    f_args$x= cc
    f_args_abs$x= abs(cc)
    cc_= do.call(format, args= f_args)
    cc_abs= do.call(format, args= f_args_abs)
    pred_vars=
        cc_abs%>%
        paste(., x_vars, sep= star)%>%
        paste(var_sign,.)%>%paste(., collapse= "")

    if(latex){
        star= " \\cdot "
        y_var= strsplit(as.character(model$call$formula), "~")[[2]]%>%
            paste0("\\hat{",.,"_{i}}")
        x_vars= names(cc_)[-1]%>%paste0(.,"_{i}")
    }else{
        star= " * "
        y_var= strsplit(as.character(model$call$formula), "~")[[2]]        
        x_vars= names(cc_)[-1]
    }

    equ= paste(y_var,"=",cc_[1],pred_vars)
    if(latex){
        equ= paste0(equ," + \\hat{\\varepsilon_{i}} \\quad where \\quad \\varepsilon \\sim \\mathcal{N}(0,",
                    summary(MetamodelKdifEryth)$sigma,")")%>%paste0("$",.,"$")
    }
    cat(equ)
}

model参数需要一个lm对象，latex参数是要求简单字符或乳胶格式方程的布尔值，参数...将其值传递给format函数。

我还添加了一个将其输出为乳胶的选项，因此您可以在 rmarkdown 中使用此函数，如下所示：


```{r echo=FALSE, results='asis'}
print_equation(model = lm_mod, latex = TRUE)
```

现在使用它：

df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
df$z <- 8 + 3 * df$x + rnorm(100, sd = 40)
lm_mod= lm(y~x+z, data = df)

print_equation(model = lm_mod, latex = FALSE)

此代码产生： y = 11.3382963933174 + 2.5893419 * x + 0.1002227 * z

如果我们要求一个乳胶方程，将参数四舍五入为 3 位数：

print_equation(model = lm_mod, latex = TRUE, digits= 3)

这产生：

score 4 · Accepted Answer

另一种选择是创建一个自定义函数，使用dplyr和broom库生成方程：

get_formula <- function(model) {
  
  broom::tidy(model)[, 1:2] %>%
    mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
    mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
    mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) %>%
    summarise(formula = paste(a, collapse = '')) %>%
    as.character
  
}

lm(y ~ x, data = df) -> model
get_formula(model)
#"y ~ 6.22 + 3.16 * x"

scales::percent(summary(model)$r.squared, accuracy = 0.01) -> r_squared

现在我们需要将文本添加到绘图中：

p + 
  geom_text(x = 20, y = 300,
            label = get_formula(model),
            color = 'red') +
  geom_text(x = 20, y = 285,
            label = r_squared,
            color = 'blue')

r - 在图上添加回归线方程和 R^2

9 回答 9

Related

Reference