1

我想以与拟合线性回归geom_quantile()类似的方式包含拟合线的相关统计数据geom_smooth(method="lm")(我以前使用过ggpmisc,这很棒)。例如,这段代码:

# quantile regression example with ggpmisc equation
# basic quantile code from here:
# https://ggplot2.tidyverse.org/reference/geom_quantile.html

library(tidyverse)
library(ggpmisc)
# see ggpmisc vignette for stat_poly_eq() code below:
# https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html#stat_poly_eq

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# linear ols regression with equation labelled
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

m + 
  geom_smooth(method = "lm", formula = my_formula) +
  stat_poly_eq(aes(label =  paste(stat(eq.label), "*\" with \"*", 
                                  stat(rr.label), "*\", \"*", 
                                  stat(f.value.label), "*\", and \"*",
                                  stat(p.value.label), "*\".\"",
                                  sep = "")),
               formula = my_formula, parse = TRUE, size = 3)  

生成这个: 带有线性ols方程的ggplot

对于分位数回归,您可以换出geom_smooth()geom_quantile()绘制一条可爱的分位数回归线(在本例中为中位数):

# quantile regression - no equation labelling
m + 
  geom_quantile(quantiles = 0.5)
  

geom_quantile 图

您将如何将摘要统计信息发送到标签,或者在旅途中重新创建它们?(即除了在调用 ggplot 之前进行回归,然后将其传递给然后进行注释(例如,类似于此处此处为线性回归所做的事情?

4

2 回答 2

1

@mark-nealstat_fit_glance()确实适用于quantreg::rq(). 然而,使用stat_fit_glance()更复杂。此统计信息不“知道”会发生什么glance(),因此必须label手动组装。

人们需要知道什么是可用的。可以在 ggplot 之外运行拟合模型并使用glance()它来找出它返回的列,或者可以在 ggplot 中借助包“gginnards”来执行此操作。我将展示这个替代方案,从上面的代码示例继续。

library(gginnards)

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), geom = "debug")

geom_debug()默认情况下只是将其输入打印到 R 控制台,其输入是统计信息返回的内容。

# A tibble: 1 x 11
   npcx  npcy   tau logLik    AIC    BIC df.residual     x      y PANEL group
  <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>       <int> <dbl>  <dbl> <fct> <int>
1    NA    NA   0.5   816. -1628. -1621.         232  1.87 0.0803 1        -1

after_stat()我们可以使用(早期的化身是stat()并包含名称来访问每一列...。我们需要使用的编码符号进行格式化sprintf()。如果在这种情况下我们组装一个需要解析为表达式的字符串,parse = TRUE也是需要。

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), 
                  mapping = aes(label = sprintf('italic(tau)~"="~%.2f~~AIC~"="~%.3g~~BIC~"="~%.3g',
                                                after_stat(tau), after_stat(AIC), after_stat(BIC))),
                  parse = TRUE)

此示例导致以下绘图。 在此处输入图像描述

使用stat_fit_tidy()相同的方法应该有效。然而,在 'ggpmisc' (<= 0.3.7) 中,它与 "lm" 一起工作,但与 "rq" 一起工作。此错误已在 'ggpmisc' (>= 0.3.8) 中修复,现在在 CRAN 中。

下面的示例仅适用于 'ggpmisc' (>= 0.3.8)

剩下的问题是tibblethatglance()tidy()return 是否包含人们想要添加到情节中的信息,这似乎不是 的情况tidy.qr(),至少在默认情况下是这样。但是,tidy.rq()有一个参数se.type确定在tibble. 修改后的stat_fit_tidy()接受命名参数传递给tidy(),使以下成为可能。

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_tidy(method = "rq",
                method.args = list(formula = y ~ x), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE)

此示例导致以下绘图。

在此处输入图像描述

定义一个新的统计数据stat_rq_eq()会使这更简单:

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                ...)
}

答案变成:

m + 
  geom_quantile(quantiles = 0.5) +
  stat_rq_eqn(tau = 0.5)
于 2021-01-21T18:16:48.183 回答
1

请将此视为 Pedro 出色答案的附录,他在其中完成了大部分繁重的工作 - 这增加了一些演示调整(颜色和线型)和代码以简化多个分位数,生成下图:

分位数图

library(tidyverse)
library(ggpmisc) #ensure version 0.3.8 or greater
library(quantreg)
library(generics)

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# base plot
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

# function for labelling
# Doesn't neatly handle P values (e.g return "P<0.001 where appropriate)

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, colour = "red", label.y = 0.9, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('italic(tau)~"="~%.3f~";"~y~"="~%.3g~+~%.3g~x*", with "~italic(P)~"="~%.3g',
                                              after_stat(x_tau),
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                colour = colour,
                label.y = label.y,
                ...)
}

# This works, though with double entry of plot specs
# custom colours and linetype
# https://stackoverflow.com/a/44383810/4927395
# https://stackoverflow.com/a/64518380/4927395


m + 
  geom_quantile(quantiles = c(0.1, 0.5, 0.9), 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
                )+
  scale_color_manual(values = c("red","purple","darkgreen"))+
  scale_linetype_manual(values = c("dotted", "dashed", "solid"))+
  stat_rq_eqn(tau = 0.1, colour = "red", label.y = 0.9)+
  stat_rq_eqn(tau = 0.5, colour = "purple", label.y = 0.95)+
  stat_rq_eqn(tau = 0.9, colour = "darkgreen", label.y = 1.0)+
  theme(legend.position = "none") # suppress legend


# not a good habit to have double entry above
# modified with reference to tibble for plot specs, 
# though still a stat_rq_eqn call for each quantile and manual vertical placement
# https://www.r-bloggers.com/2019/06/curly-curly-the-successor-of-bang-bang/

my_tau = c(0.1, 0.5, 0.9)
my_colours = c("red","purple","darkgreen")
my_linetype = c("dotted", "dashed", "solid")

quantile_plot_specs <- tibble(my_tau, my_colours, my_linetype)

m + 
  geom_quantile(quantiles = {{quantile_plot_specs$my_tau}}, 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
  )+
  scale_color_manual(values = {{quantile_plot_specs$my_colours}})+
  scale_linetype_manual(values = {{quantile_plot_specs$my_linetype}})+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[1]}}, colour = {{quantile_plot_specs$my_colours[1]}}, label.y = 0.9)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[2]}}, colour = {{quantile_plot_specs$my_colours[2]}}, label.y = 0.95)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[3]}}, colour = {{quantile_plot_specs$my_colours[3]}}, label.y = 1.0)+
  theme(legend.position = "none")
于 2021-01-29T03:46:51.313 回答