r - 如何使用geom_col在同一个ggplot中并排绘制两个变量？

Question

我有以下数据

structure(list(id = 1:7, date = c(2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L), station = structure(1:7, .Label = c("41B004", 
"41B011", "41MEU1", "41N043", "41R001", "41R012", "41WOL1"), class = "factor"), 
    days = c(6L, 21L, 5L, 9L, 13L, 14L, 3L), mean3y = c(8.33, 
    21.3, NA, 10, 11.3, 16.3, 3.67), environ = structure(c(3L, 
    4L, 2L, 1L, 3L, 4L, 3L), .Label = c("Industriel avec influence modérée du trafic", 
    "Urbain avec faible influence du trafic", "Urbain avec influence modérée du trafic", 
    "Urbain avec très faible influence du trafic"), class = "factor")), class = "data.frame", row.names = c(NA, 
-7L))

使用以下 ggplot 代码绘制

ggplot(data, aes(x = reorder(station, -days), 
                 y = days, fill = environ)) + 
  geom_col(width = 0.5, colour = "black", size = 0.5) + 
  guides(fill = guide_legend(ncol = 2)) +
  geom_text(aes(label = days), 
            vjust=-0.3, color="black", size = 3.5) +
  geom_hline(aes(yintercept = 25), 
             linetype = 'dashed', colour = 'red', size = 1) +
  labs(x = '', y = bquote("Nombre de jours de dépassement de NET60" ~ O[3] ~ "en 2019")) +
  theme_minimal() + 
  theme(legend.position="bottom", legend.title = element_blank(), 
        legend.margin=margin(l = -2, unit='line'),
        legend.text = element_text(size = 11),
        axis.text.y = element_text(size = 12), 
        axis.title.y = element_text(size = 11), 
        axis.text.x = element_text(size = 11),
        panel.grid.major.x = element_blank()) + 
  geom_hline(yintercept = 0)

生成这个数字。

我还想在这个图中添加mean3y除了days每个 x 值之外的变量geom_col，例如

p <- ggplot(data, aes(x = reorder(station, -days), 
                      y = days, fill = environ)) + 
  geom_col(width = 0.5, colour = "black", size = 0.5) + 
  guides(fill = guide_legend(ncol = 2)) +
  geom_text(aes(label = days), 
            vjust=-0.3, color="black", size = 3.5) +
  geom_col(aes(x = reorder(station, -days), 
               y = mean3y, fill = environ), 
           inherit.aes = FALSE,
           width = 0.5, colour = "black", size = 0.5) +
  geom_hline(aes(yintercept = 25), 
             linetype = 'dashed', colour = 'red', size = 1) +
  labs(x = '', y = bquote("Nombre de jours de dépassement de NET60" ~ O[3] ~ "en 2019")) +
  theme_minimal() + 
  theme(legend.position="bottom", 
        legend.title = element_blank(), 
        legend.margin=margin(l = -2, unit='line'),
        legend.text = element_text(size = 11),
        axis.text.y = element_text(size = 12), 
        axis.title.y = element_text(size = 11), 
        axis.text.x = element_text(size = 11),
        panel.grid.major.x = element_blank()) + 
  geom_hline(yintercept = 0)

但是，尽管使用了，但我无法达到预期的结果，position = "dodge"如图所示，两个变量都重叠。

请问有没有办法做到这一点？非常感谢。

score 2 · Accepted Answer

位置闪避仅在单个图层中起作用，而不在多个图层之间起作用。您可以通过手动轻推它们或以可以躲避的方式格式化数据来解决问题。下面的代码中的示例。

您的数据很难复制到我的 R 会话中，并且您的代码比演示问题所需的更复杂，因此我将两者都保持在最低限度。

library(ggplot2)

df <- data.frame(
  x = c("A", "B"), 
  y = c(10, 15),
  z = c(12, 9)
)

# Example of nudging
# Choose width and nudge values manually to fit your data
ggplot(df, aes(x, y)) +
  geom_col(aes(fill = "first col"), 
           width = 0.45,
           position = position_nudge(x = -0.225)) +
  geom_col(aes(y = z, fill = "second_col"), 
           width = 0.45,
           position = position_nudge(x = 0.225))


library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Example of dodging + data formatting
ggplot(mapping = aes(x, y)) +
  geom_col(data = rbind(mutate(df, a = "first_col"),
                        mutate(df, y = z, a = "second_col")),
           aes(fill = a),
           position = "dodge")

^{由reprex 包（v0.3.0）于 2020-04-16 创建}

score 1 · Accepted Answer

实现此目的的一种方法是通过 eg 将数据转换为长格式tidyr::pivot_longer，以便我们要绘制的变量是一个变量的类别。为了获得正确的电台顺序，我在转换为长之前station根据重新排序。days为了并排放置条形，我position_dodge2同时使用 ingeom_col和geom_text. 为了显示哪个条对应于哪个 var，我将 var 的名称放在条上方的标签中。

library(ggplot2)
library(dplyr)
library(tidyr)

data1 <- data %>% 
  mutate(station = forcats::fct_reorder(station,-days)) %>% 
  pivot_longer(c(days, mean3y), names_to = "var", values_to = "value")

my_labels <- function(x) {
  gsub("(days.|mean3y.)", "", x)
}

    p <- ggplot(data1, aes(x = station, y = value, fill = environ)) + 
  geom_col(position = position_dodge2(preserve = "single"), colour = "black") + 
  guides(fill = guide_legend(ncol = 2)) +
  geom_text(aes(label = paste(var, "\n", value)), position = position_dodge2(width = .9, preserve = "single"), vjust=-0.3, color="black", size = 3.5) +
  scale_x_discrete(labels = my_labels) +
  geom_hline(aes(yintercept = 25), linetype = 'dashed', colour = 'red', size = 1) +
  labs(x = '', y = bquote("Nombre de jours de dépassement de NET60" ~ O[3] ~ "en 2019")) +
  theme_minimal() + theme(legend.position="bottom", legend.title = element_blank(), legend.margin=margin(l = -2, unit='line'),
                          legend.text = element_text(size = 11),
                          axis.text.y = element_text(size = 12), axis.title.y = element_text(size = 11), 
                          axis.text.x = element_text(size = 11),
                          panel.grid.major.x = element_blank()) + geom_hline(yintercept = 0)

score 1 · Accepted Answer

考虑为您的数据集考虑这种可能的解决方案 - 尽管您可能想玩弄美学。我试图保持美学尽可能相似并将条形设置为相同的颜色（基于df$environ），但使用文本标签使“days”和“mean3y”之间的区别变得清晰。

数据准备

首先，我们需要从两列中获取信息并将它们组合起来：“days”和“mean3y”。在您的原始数据框中，可以（并且应该）将这两列组合以显示值的类型和值本身。我们要做的是转换这种类型的数据：

  day.type.1 day.type.2
1          4          1
2          5          3
3          6          4
4          7          5

对于这种类型的数据：

    day.type day.value
1 day.type.1         4
2 day.type.1         5
3 day.type.1         6
4 day.type.1         7
5 day.type.2         1
6 day.type.2         3
7 day.type.2         4
8 day.type.2         5

在上面的示例中，您可以使用以下gather()函数dplyr：

t %>% gather('day.type', 'day.value')

如果我们将其应用于您的数据框，我们必须指定对数据框执行此操作，但忽略其他列：

df1 <- df %>% gather('variable', 'value', -date, -station, -environ)

这会将您的“days”和“mean3y”列转换为两个名为“variable”（即“days”或“mean3y”）和“value”（即实际数字）的新列

我还必须将新列“值”转换为数字......但这可能是由于我必须如何导入您的数据，这很困难。请注意，建议您通过dput(your.data.frame)...的输出将您的数据集包含在未来的问题中。相信我，这一切都不同。;)

绘制新数据集

这里的想法是保持相同的 x 轴，但我们现在将“值”设置为 y 美学。此外，您要确保包含group=“变量”的美感，以便闪避适用于文本和列。如果您不熟悉，“躲避”是指几何图形在轴美学上“分裂”的术语：例如离散轴值的“子集”。

geom_col呼叫设置为position='dodge'......那里没有太多其他变化。您需要这个，因为默认position设置为“堆叠”（这就是您的尝试导致列“堆叠”在彼此之上的原因。

通话中geom_text发生了一些事情：

闪避在此处设置为position=position_dodge()，它允许您指定“闪避”将相距多远。它允许我将标签“推开”得更宽一些，这样文本看起来还不错，并且不会碰到相邻的列。更大的width=参数position_dodge()导致“推”标签更远。值 0 会将标签放在 x 轴美学的中心...默认值为 0.5。
标签美学实际上是同时使用“变量”和“值”列作为区分您的列的一种方式。我使用paste0并卡'\n'在中间，这样你就有两条线并且可以适合它们。也不得不稍微调整一下大小。
默认情况下，标签将位于 y（值）处，这意味着它们将与您的列重叠。你需要“轻推”它们，但不能nudge_y用来推它们，因为你不能nudge_y与position. 该怎么办？好吧，我们可以通过将其设置为 y + "a number" 来覆盖默认的 y 美学来推动它们。这样做要好得多。

这是最终的代码：

ggplot(df1, aes(x = reorder(station, -value),
                 y = value, fill = environ,
                group=variable)) + 
    geom_col(width = 0.5, colour = "black", size = 0.5, position='dodge') + 
    guides(fill = guide_legend(ncol = 2)) +
    geom_text(aes(label = paste0(variable,'\n', value), y=value+1.5), 
              color="black", size = 3,
              position=position_dodge(0.7)) +
    geom_hline(aes(yintercept = 25), 
               linetype = 'dashed', colour = 'red', size = 1) +
    labs(x = '', y = bquote("Nombre de jours de dépassement de NET60" ~ O[3] ~ "en 2019")) +
    theme_minimal() + 
    theme(legend.position="bottom", legend.title = element_blank(), 
          legend.margin=margin(l = -2, unit='line'),
          legend.text = element_text(size = 11),
          axis.text.y = element_text(size = 12), 
          axis.title.y = element_text(size = 11), 
          axis.text.x = element_text(size = 11),
          panel.grid.major.x = element_blank()) + 
    geom_hline(yintercept = 0)

r - 如何使用geom_col在同一个ggplot中并排绘制两个变量？

3 回答 3

Related

Reference