7

我正在尝试为 编写一个自定义stat_*ggplot2我想在其中使用瓷砖为 2D 黄土表面着色。当我从扩展指南开始时,我可以像他们一样编写一个 stat_chull :

stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  chull = ggproto("chull", Stat,
    compute_group = function(data, scales) {
      data[chull(data$x, data$y), , drop = FALSE]
    },
    required_aes = c("x", "y")
  )

  layer(
    stat = chull, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

这适用于简单的调用和方面包装:

ggplot(mpg, aes(x=displ, y=hwy)) + 
  geom_point() + 
  stat_chull()
# optionally + facet_wrap(~ class)

没有刻面的冷静 面面俱到

当我编写 mystat_loess2d时,我还可以可视化所有类或单个类:

stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  loess2d = ggproto("loess2d", Stat,
    compute_group = function(data, scales) {
      dens = MASS::kde2d(data$x, data$y)
      lsurf = loess(fill ~ x + y, data=data)
      df = data.frame(x = rep(dens$x, length(dens$y)),
                      y = rep(dens$y, each=length(dens$x)),
                      dens = c(dens$z))
      df$fill = predict(lsurf, newdata=df[c("x", "y")])
      df
    },
    required_aes = c("x", "y", "fill")
  )

  layer(
    stat = loess2d, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

所有的loess2d 一类黄土2d

但是,当我尝试解决上述问题时,不再显示图块:

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)

带有刻面的 loess2d 没有显示瓷砖

有人可以告诉我我在这里做错了什么吗?

4

1 回答 1

1

解释

我在这里看到的主要问题实际上超出了您所做的,并且与geom_tile当特定的 x / y 轴值显着不同时如何处理跨不同方面的图块创建有关。一个较老的问题展示了类似的现象:geom_tile单独处理每个方面的数据可以正常工作,但是将它们放在一起,并且图块会缩小以匹配不同方面值之间的最小差异。这会在绘图层中留下大量空白,并且通常随着每个额外的方面而变得更糟,直到瓷砖本身变得几乎不可见。

为了解决这个问题,我会在每个方面的密度/黄土计算之后添加一个数据处理步骤,以标准化所有方面的 x 和 y 值的范围。

如果您不太熟悉, 和之间的关系compute_layer,请进行一些详细说明(当我开始弄乱 ggproto 对象时,我当然不是...):compute_panelcompute_group

  • 本质上,所有Stat*对象都具有这三个功能来弥合给定数据帧(mpg在这种情况下)与Geom*事物侧面接收到的数据之间的差距。

  • 三者中,compute_layer是顶层函数,通常触发compute_panel为每个 facet/panel 计算单独的数据框(导出函数中使用的术语是 facet,但底层包代码与 panel 相同;我不是肯定为什么)。反过来,compute_panel触发器compute_group为每个组计算一个单独的数据帧(由///等group美学参数定义)。colourfill

  • 来自的结果compute_group返回compute_panel并合并到一个数据帧中。同样,compute_layer从每个方面接收一个数据帧compute_panel,并将它们再次组合在一起。然后将组合的数据框传递给Geom*进行绘制。

(上面是顶层定义的通用设置Stat。其他Stat*继承自的对象Stat可能会覆盖任何步骤中的行为。例如,StatIdentity'compute_layer按原样返回原始数据帧,根本不触发compute_panel/ compute_group,因为对于未更改的数据,无需这样做。)

对于这个用例,我们可以在compute_layercompute_panel/返回结果compute_group并组合在一起之后修改 中的代码,以将与每个方面关联的值插入到公共 bin 中。因为普通垃圾箱 = 漂亮的大瓷砖,中间没有空白。

修改

这是我编写loess2dggproto 对象的方式,并附加了以下定义compute_layer

loess2d = ggproto("loess2d", Stat,
                  compute_group = function(data, scales) {
                    dens = MASS::kde2d(data$x, data$y)
                    lsurf = loess(fill ~ x + y, data=data)
                    df = data.frame(x = rep(dens$x, length(dens$y)),
                                    y = rep(dens$y, each=length(dens$x)),
                                    dens = c(dens$z))
                    df$fill = predict(lsurf, newdata=df[c("x", "y")])
                    df
                  },
                  compute_layer = function(self, data, params, layout) {
                    # no change from Stat$compute_layer in this chunk, except
                    # for liberal usage of `ggplot2:::` to utilise un-exported
                    # functions from the package
                    ggplot2:::check_required_aesthetics(self$required_aes, 
                                                        c(names(data), names(params)), 
                                                        ggplot2:::snake_class(self))
                    data <- remove_missing(data, params$na.rm, 
                                           c(self$required_aes, self$non_missing_aes), 
                                           ggplot2:::snake_class(self),
                                           finite = TRUE)
                    params <- params[intersect(names(params), self$parameters())]
                    args <- c(list(data = quote(data), scales = quote(scales)), params)
                    df <- plyr::ddply(data, "PANEL", function(data) {
                      scales <- layout$get_scales(data$PANEL[1])
                      tryCatch(do.call(self$compute_panel, args), 
                               error = function(e) {
                                 warning("Computation failed in `", ggplot2:::snake_class(self), 
                                         "()`:\n", e$message, call. = FALSE)
                                 data.frame()
                               })
                    })

                    # define common x/y grid range across all panels
                    # (length = 25 chosen to match the default value for n in MASS::kde2d)
                    x.range <- seq(min(df$x), max(df$x), length = 25)
                    y.range <- seq(min(df$y), max(df$y), length = 25)

                    # interpolate each panel's data to a common grid,
                    # with NA values for regions where each panel doesn't
                    # have data (this can be changed via the extrap
                    # parameter in akima::interp, but I think  
                    # extrapolating may create misleading visuals)
                    df <- df %>% 
                      tidyr::nest(-PANEL) %>%
                      mutate(data = purrr::map(data, 
                                               ~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
                                                              xo = x.range, yo = y.range,
                                                              nx = 25, ny = 25) %>%
                                                 akima::interp2xyz(data.frame = TRUE) %>%
                                                 rename(fill = z))) %>%
                      tidyr::unnest()

                    return(df)
                  },
                  required_aes = c("x", "y", "fill")
)

用法:

ggplot(mpg,
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue

ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# no warnings triggered

结果

于 2019-08-17T16:20:09.243 回答