2

假设我有一些看起来有点像这样的数据

library(ggplot2)
library(dplyr)

employee <- employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)

我正在使用一个看起来像这样的 geom_bin2d 图

ggplot(dat, aes(x, y)) +
  geom_bin2d(binwidth = c(20, 20)) +
  scale_fill_gradient2(low="darkred", high = "darkgreen")

<a href="https://i.stack.imgur.com/5p9n6.png" rel="nofollow noreferrer">情节

如何更改 bin 的颜色以反映“坏”的 x/y 点与整个数据集中该区域的总体平均值相比的百分比?即,如果左下角 bin 中“坏”点的平均值是 x 数,而 John 在该区域的平均值是 y 较低的数字,我怎样才能使 bin 颜色更深以表明他的计数较低?

我认为这可以创建平均值:

df2 <-  employ.data
df2$xbin <- cut(df2$x, breaks = seq(0, 100, by = 20))
df2$ybin <- cut(df2$y, breaks = seq(0, 100, by = 20))
df2 <- df2 %>% group_by(xbin, ybin) %>% mutate(ave_pct = mean(quality == "bad"))
df2 <- df2 %>% group_by(employee, xbin, ybin) %>%  mutate(person_pct = mean(quality == "bad"))

但后来我不知道如何绘制它。

4

1 回答 1

2

因此,如果我对您的理解正确,您希望按照每个相应的坏员工的百分比与坏员工的总体百分比的比较来对这些垃圾箱进行着色。为此,我将其计算方式更改为:

df <- employ.data %>%
  mutate(xbin = cut(x, breaks = seq(0, 100, by = 20)),
         ybin = cut(y, breaks = seq(0, 100, by = 20)),
         overall_ave = mean(quality == "bad")) %>%
  group_by(xbin, ybin) %>%
  mutate(bin_ave = mean(quality == "bad")) %>%
  ungroup() %>%
  mutate(bin_quality = bin_ave - overall_ave)

这将创建垃圾箱,然后找到“差”质量员工的总体百分比。然后它按各自的垃圾箱分组,并找到每个垃圾箱的“坏”员工的百分比。然后它将每个 bin 平均值与整体平均值进行比较。这为具有较高“好”员工百分比的垃圾箱提供正值,而bin_quality为具有较高“坏”员工百分比的垃圾箱提供负数。

fill = bin_quality然后,group = bin_quality您可以通过aes()ggplot. 您还需要添加aes(group = bin_quality)到您的geom_bin2d通话中。它看起来像这样:

ggplot(df, aes(x, y, fill = bin_quality, group = bin_quality)) +
  geom_bin2d(aes(group = bin_quality), binwidth = c(20, 20)) +
  scale_fill_gradient2(low="darkred", high = "darkgreen") 

这给了你这个图表:

在此处输入图像描述

于 2017-10-04T17:43:08.073 回答