我正在写这个答案,部分是为了后代,部分是因为我一直想为其他一些试图在 R 中进行自定义可视化的人写一些这样的函数。
背景
在 R 中,许多人正确地抛弃了基础绘图函数,开始转向更灵活的包装程序包,“lattice”和“ggplot2”。这些是通过在单个图上应用逻辑层来快速探索数据的强大工具。然后,这些包会处理所有层并生成一个图窗口,并适当地安排。这些包很棒,我建议每个 R 用户至少学习其中一个。
不过需要注意的是,'lattice' 和 'ggplot2' 包实际上更多地用于数据探索,而不是智能数据可视化。在创建自定义数据可视化时,这些包为您做出了太多决定,因为这就是包装器的用途:让您无法做出一些决定。
自定义可视化?输入“网格”
基本的“网格”包是绘图灵活性的终极选择,部分原因是它扩展了基本绘图功能的功能,而不是包装它们。使用“网格”功能,我们获得了使用各种不同的放置和大小单位创建视觉对象的能力,并且(这非常重要)我们获得了为对象的锚点使用理由的能力。如果您想学习,Paul Murrell 的书“R 图形”是一个极好的资源。它的副本放在我的桌子上。
如果您曾经使用过矢量图形绘图程序(如 Illustrator 或 Inkscape),那么当我提到理由时,您可能已经知道我在说什么。这是通过引用其他项目的位置来排列项目的能力。我会更多地谈论这个,但我可以整天谈论它。让我们继续这个过程。
过程
现在,我应该先说我花了大约两个小时来编写函数库,大约 5 分钟来编写演示代码。以后我会使用函数库作为训练工具,任何人都可以随意使用/修改它。
“网格”过程分为三个基本步骤:
- 制作视口
- 绘制一些对象
- 弹出你的视口
在制作视口时,我们使用“pushViewport”来推送“视口”对象,如下所示:
pushViewport(viewport(x=0, y=1, xscale=c(1, 10), yscale=c(0, 100), width=0.25, height=0.25, default.units="npc", just=c("left","bottom"), clip="off"))
基本视口有一组“npc”单位,其中 x 从 0 到 1,从左到右,y 从 0 到 1,从下到上。这意味着原点位于左下角。上面的视口被创建为左下角绘图的四分之一。但是,当我们指定“xscale”和“yscale”时,我们可以在绘制对象时引用“native”单位。这意味着我们可以使用“本机”单位来绘制数据,并在绘制轴和标签等内容时使用“npc”单位。
绘制对象时,我们使用'grid.lines'、'grid.polygon'、'grid.points'、'grid.circle'等函数。我所做的每一个可视化都使用了这些对象。当您通过手动指定这些对象来绘制数据时,您可以获得大量的控制权。填充折线图是添加功能的最明显示例之一。填充区域只是一个多边形,其中包含由数据指定的多边形的点并添加了两个锚点。我用它来突出显示折线图的区域,或者更容易阅读同一图表上的多条线。
您还可以发挥创意,例如,创建不是矩形的条形,或者以更复杂的方式组合多个绘图。我和其他一些人最近运行了一个科幻主题的步行游戏,我们使用自定义图表(用“网格”制作)来展示我们的最终表现。该图表将“幸存者”团队的天数组合为时间轴,将玩家与敌人每天的步数显示为条形图,并将累积的玩家和敌人每天的步数显示为实心折线图。我很难使用“lattice”或“ggplot2”包来创建类似的视觉效果。
这是其中一个图表的示例(没有现实生活中的玩家姓名),以了解“网格”视觉效果的灵活性:
问题的概念证明
现在具体解决OP提出的问题。在问题中,OP 暗示他/她将在每个区域内绘制图表。使用预构建的绘图包时,这可能会变得很棘手,因为大多数绘图功能会覆盖您已经设置的任何绘图规范。相反,使用诸如基本“网格”函数之类的东西来指定绘图区域,然后在视口中绘制必要的数据对象更为可靠。
为了避免工作太辛苦,我首先编写了一个自定义函数库,用于设置我的各种图表参数并为我绘制每种类型的图表。我不喜欢调试代码,所以函数是我处理事情的方式。每次我得到一段正确的代码时,我都会把它扔到一个函数中供以后使用。
代码可能看起来有点复杂,但请记住三个“网格”步骤:push viewport、draw、pop viewport。这就是每个函数正在做的事情。为了演示这项工作,我制作了四种不同的绘图功能:填充折线图、散点图、直方图和 OP 建议的方框图。每个函数都足够灵活,可以在每个图表中容纳多组数据值,设置 alpha 值以进行补偿,并允许我们查看绘制在彼此之上的值。
在这种情况下,您只需要使您的函数尽可能灵活,因此我确实采用了一条捷径,并从演示中的一些代码中提取了它们,这些代码做了很多假设。不过,我仍然使用逻辑驱动的代码来绘制它,以演示如何使用简单的逻辑绘制更复杂的对象。
这是演示代码的结果,使用一些内置的 R 数据集来获取简单数据(EuStockMarkets、nottem、sunspots.month):
自定义函数库:
library(grid)
# Specify general chart options.
chart_Fill = "lemonchiffon"
chart_Col = "snow3"
space_Background = "white"
title_CEX = 0.8
axis_CEX = 0.6
chart_Width <- 3/3
chart_Height <- 2/5
# Function to initialize a plotting area.
init_Plot <- function(
.df,
.x_Loc,
.y_Loc,
.justify,
.width,
.height
){
# Initialize plotting area to fit data.
# We have to turn off clipping to make it
# easy to plot the labels around the plot.
pushViewport(viewport(xscale=c(min(.df[,1]), max(.df[,1])), yscale=c(min(0,min(.df[,-1])), max(.df[,-1])), x=.x_Loc, y=.y_Loc, width=.width, height=.height, just=.justify, clip="off", default.units="native"))
# Color behind text.
grid.rect(x=0, y=0, width=unit(axis_CEX, "lines"), height=1, default.units="npc", just=c("right", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
# Color in the space.
grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))
}
# Function to finalize and label a plotting area.
finalize_Plot <- function(
.df,
.plot_Title
){
# Label plot using the internal reference
# system, instead of the parent window, so
# we always have perfect placement.
grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))
grid.text(paste(names(.df)[-1], collapse=" & "), x=-0.05, y=0.5, just=c("center","bottom"), rot=90, default.units="npc", gp=gpar(cex=axis_CEX))
grid.text(names(.df)[1], x=0.5, y=-0.05, just=c("center","top"), rot=0, default.units="npc", gp=gpar(cex=axis_CEX))
# Finalize plotting area.
popViewport()
}
# Function to plot a filled line chart of
# the data in a data frame. The first column
# of the data frame is assumed to be the
# plotting index, with each column being a
# set of y-data to plot. All data is assumed
# to be numeric.
plot_Line_Chart <- function(
.df,
.x_Loc,
.y_Loc,
.justify,
.width,
.height,
.colors,
.plot_Title
){
# Initialize plot.
init_Plot(.df, .x_Loc, .y_Loc, .justify, .width, .height)
# Calculate what value to use as the
# return for the polygons.
y_Axis_Min <- min(0, min(.df[,-1]))
# Plot each set of data as a polygon,
# so we can fill it in with color to
# make it easier to read.
for (i in 2:ncol(.df)){
grid.polygon(x=c(min(.df[,1]),.df[,1], max(.df[,1])), y=c(y_Axis_Min,.df[,i], y_Axis_Min), default.units="native", gp=gpar(fill=.colors[i-1], col=.colors[i-1], alpha=1/ncol(.df)))
}
# Draw plot axes.
grid.lines(x=0, y=c(0,1), default.units="npc")
grid.lines(x=c(0,1), y=0, default.units="npc")
# Finalize plot.
finalize_Plot(.df, .plot_Title)
}
# Function to plot a scatterplot of
# the data in a data frame. The
# assumptions are the same as 'plot_Line_Chart'.
plot_Scatterplot <- function(
.df,
.x_Loc,
.y_Loc,
.justify,
.width,
.height,
.colors,
.plot_Title
){
# Initialize plot.
init_Plot(.df, .x_Loc, .y_Loc, .justify, .width, .height)
# Plot each set of data as colored points.
for (i in 2:ncol(.df)){
grid.points(x=.df[,1], y=.df[,i], pch=19, size=unit(1, "native"), default.units="native", gp=gpar(col=.colors[i-1], alpha=1/ncol(.df)))
}
# Draw plot axes.
grid.lines(x=0, y=c(0,1), default.units="npc")
grid.lines(x=c(0,1), y=0, default.units="npc")
# Finalize plot.
finalize_Plot(.df, .plot_Title)
}
# Function to plot a histogram of
# all the columns in a data frame,
# except the first, which is assumed to
# be an index.
plot_Histogram <- function(
.df,
.x_Loc,
.y_Loc,
.justify,
.width,
.height,
.colors,
.plot_Title,
...
){
# Create a list containing the histogram
# data for each data column and calculate
# data ranges. Any extra parameters
# specified will pass to the 'hist' function.
hist_Data <- list()
hist_Count_Range <- c(0,NA)
hist_Breaks_Range <- c(NA,NA)
for (i in 2:ncol(.df)){
hist_Data[[i]] <- hist(.df[,i], plot=FALSE, ...)
hist_Count_Range[2] <- max(max(hist_Data[[i]]$counts), hist_Count_Range[2], na.rm=TRUE)
hist_Breaks_Range <- c(min(min(hist_Data[[i]]$breaks), hist_Breaks_Range[1], na.rm=TRUE), max(max(hist_Data[[i]]$breaks), hist_Breaks_Range[2], na.rm=TRUE))
}
# Initialize plotting area to fit data.
# We are doing this in a custom way to
# allow more flexibility than built into
# the 'init_Plot' function.
# We have to turn off clipping to make it
# easy to plot the labels around the plot.
pushViewport(viewport(xscale=hist_Breaks_Range, yscale=hist_Count_Range, x=.x_Loc, y=.y_Loc, width=.width, height=.height, just=.justify, clip="off", default.units="native"))
# Color behind text.
grid.rect(x=0, y=0, width=unit(axis_CEX, "lines"), height=1, default.units="npc", just=c("right", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
# Color in the space.
grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))
# Draw x axis.
grid.lines(x=c(0,1), y=0, default.units="npc")
# Plot each set of data as a histogram.
for (i in 2:ncol(.df)){
grid.rect(x=hist_Data[[i]]$mids, y=0, width=diff(hist_Data[[i]]$mids[1:2]), height=hist_Data[[i]]$counts, default.units="native", just=c("center","bottom"), gp=gpar(fill=.colors[i-1], col=.colors[i-1], alpha=1/ncol(.df)))
}
# Label plot using the internal reference
# system, instead of the parent window, so
# we always have perfect placement.
grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))
grid.text(paste(names(.df)[-1], collapse=" & "), x=-0.05, y=0.5, just=c("center","bottom"), rot=90, default.units="npc", gp=gpar(cex=axis_CEX))
# Finalize plotting area.
popViewport()
}
draw_Sample_Box <- function(
.x_Loc,
.y_Loc,
.x_Scale,
.y_Scale,
.justify,
.width,
.height,
.colors,
.box_X,
.box_Y,
.plot_Title
){
pushViewport(viewport(xscale=.x_Scale, yscale=.y_Scale, x=.x_Loc, y=.y_Loc, width=chart_Width, height=chart_Height, just=.justify, clip="off", default.units="native"))
# Color behind text.
grid.rect(x=0, y=1, width=1, height=unit(title_CEX, "lines"), default.units="npc", just=c("left", "bottom"), gp=gpar(fill=space_Background, col=space_Background))
# Color in the space.
grid.rect(gp=gpar(fill=chart_Fill, col=chart_Col))
# Label plot.
grid.text(.plot_Title, x=0.5, y=1.05, just=c("center","bottom"), rot=0, default.units="npc", gp=gpar(cex=title_CEX))
# Draw box and label points.
grid.polygon(x=.box_X, y=.box_Y, default.units="native", gp=gpar(fill=.colors[1], col=.colors[2]))
grid.text(paste(.plot_Title, 1, sep=""), x=min(.box_X), y=min(.box_Y), default.units="native", just=c("right","top"), gp=gpar(cex=0.5))
grid.text(paste(.plot_Title, 2, sep=""), x=max(.box_X), y=min(.box_Y), default.units="native", just=c("left","top"), gp=gpar(cex=0.5))
# Finalize plot.
popViewport()
}
演示代码:
# Draw twelve independent charts as
# a demo and connect with lines similar
# to a heiritage chart.
grid.newpage()
# Initialize a viewport to make our locations
# easier to map.
pushViewport(viewport(x=0, y=0, width=1, height=1, just=c("left","bottom"), xscale=c(0,10), yscale=c(0,4)))
# Color background of overall plot.
grid.rect(gp=gpar(fill=space_Background, col=space_Background))
# Store plot locations for convenience.
plot_Loc <- data.frame(x=c(2,4,6,8,1,3,7,9,2,4,6,8), y=c(3,3,3,3,2,2,2,2,1,1,1,1))
# Draw connecting lines.
connections <- data.frame(a=c(1, 3, 5, 6, 7, 1, 3, 5, 7, 6), b=c(2, 4, 6, 7, 8, 2, 4, 6, 8, 7), c=c(NA, NA, NA, NA, NA, 6, 7, 9, 12, 10), d=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 11))
for (i in 1:nrow(connections)){
if (is.na(connections$c[i])){
grid.lines(x=plot_Loc$x[unlist(connections[i,1:2])], y=plot_Loc$y[unlist(connections[i,1:2])], default.units="native")
} else if (is.na(connections$d[i])) {
grid.lines(x=median(plot_Loc$x[unlist(connections[i,1:2])]), y=plot_Loc$y[unlist(connections[i,2:3])], default.units="native")
} else {
grid.lines(x=median(plot_Loc$x[unlist(connections[i,1:2])]), y=c(plot_Loc$y[connections[i,2]], median(plot_Loc$y[unlist(connections[i,2:3])])), default.units="native")
grid.lines(x=plot_Loc$x[unlist(connections[i,3:4])], y=median(plot_Loc$y[unlist(connections[i,2:3])]), default.units="native")
grid.lines(x=plot_Loc$x[connections[i,3]], y=c(median(plot_Loc$y[unlist(connections[i,2:3])]), plot_Loc$y[connections[i,3]]), default.units="native")
grid.lines(x=plot_Loc$x[connections[i,4]], y=c(median(plot_Loc$y[unlist(connections[i,2:3])]), plot_Loc$y[connections[i,4]]), default.units="native")
}
}
# Draw four independent line charts.
p <- 1
plot_Line_Chart(data.frame(time=1:1860, EuStockMarkets)[1:3], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("dodgerblue", "deeppink"), "EU Stocks")
p <- 2
plot_Line_Chart(data.frame(time=1:1860, EuStockMarkets)[c(1,4,5)], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("green", "purple"), "EU Stocks")
p <- 3
plot_Line_Chart(data.frame(time=1:(12*20), sunspots=sunspot.month[(171*12+1):(171*12+12*20)]), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("darkgoldenrod"), "Sunspots")
p <- 4
plot_Line_Chart(data.frame(time=1:(12*20), temp=nottem), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("red"), "Nottem")
# Draw four independent scatterplots.
p <- 5
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 1000), DAX=rowMeans(embed(EuStockMarkets[,1], 1000)), FTSE=rowMeans(embed(EuStockMarkets[,4], 1000))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth")
p <- 6
plot_Scatterplot(data.frame(time=1:1860, EuStockMarkets)[c(1,2,5)], .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "EU Stocks")
p <- 9
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 20), DAX=rowMeans(embed(EuStockMarkets[,1], 20)), FTSE=rowMeans(embed(EuStockMarkets[,4], 20))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth*20")
p <- 10
plot_Scatterplot(data.frame(time=1:(1860 + 1 - 100), DAX=rowMeans(embed(EuStockMarkets[,1], 100)), FTSE=rowMeans(embed(EuStockMarkets[,4], 100))), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("deeppink", "purple"), "Smooth*100")
# Draw two independent histograms.
p <- 7
plot_Histogram(data.frame(time=1:(12*20), sunspots=sunspot.month[(171*12+1):(171*12+12*20)]), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("darkgoldenrod"), "Sunspots", breaks=6)
p <- 8
plot_Histogram(data.frame(time=1:(12*20), temp=nottem), .x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .just=c("center","center"), .width=chart_Width, .height=chart_Height, c("red"), "Nottem", breaks=6)
# Draw sample objects in two charts spaces.
p <- 11
draw_Sample_Box(.x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .x_Scale=c(0,10), .y_Scale=c(-10,0), .justify=c("center","center"), .width=chart_Width, .height=chart_Height, .colors=c("dodgerblue","blue"), .box_X=c(4,6,6,4), .box_Y=c(-4,-4,-5,-5), .plot_Title="K")
p <- 12
draw_Sample_Box(.x_Loc=plot_Loc$x[p], .y_Loc=plot_Loc$y[p], .x_Scale=c(-1,1), .y_Scale=c(0,1), .justify=c("center","center"), .width=chart_Width, .height=chart_Height, .colors=c("dodgerblue","blue"), .box_X=c(-0.5,0,0,-0.5), .box_Y=c(0.8,0.8,0.7,0.7), .plot_Title="L")